About This Course
This course is now taught virtually, with classes led online by an Enthought trainer in real-time on GoToMeeting.
We endeavour to deliver these virtual programs as we would a face-to-face program. Interaction with the trainer is encouraged.
This 5-day class combines our 3-day Python Foundations with essential materials on machine learning and data visualization. It provides the skills needed by scientists, engineers, data scientists, data analysts, and business intelligence experts to use Python and machine learning for their data mining, classification, and predictive modeling tasks. This highly interactive training will empower your team with the skills they need to build reliable, repeatable analyses, and prediction workflows. After this class, they will be able to significantly increase the amount of data they can process, thanks to automation, and speed up the classification, interpretation, and analysis of data.
Artificial intelligence and machine learning are defining features of the 21st century and are quickly becoming a key factor in gaining and maintaining competitive advantage in each industry which incorporates them.
In this course, we combine conceptual knowledge of machine learning with extensive experience applying it to real-world data. Your team will develop skills in applying Python’s machine learning tools, such as the scikit-learn package, to make predictions about complicated phenomena by leveraging the information contained in numerical data, natural language, images, and discrete categories.
The emphasis is on learning techniques to maximize the predictive performance of machine learning workflows. After building a solid foundation in the Python scientific stack, we focus on the different types of feature sources for machine learning. For each, we progress through a short introductory lecture followed by exercises of progressive difficulty. Intermingled with the machine learning material are short discussions of helpful and diagnostic data visualizations.
Days 1–3: Python Foundations
- It begins with a one-day introduction to the Python language focusing on standard data structures, control constructs, and code organization.
- After a brief overview of the Scientific Python ecosystem, we dive into techniques for numeric data processing, including efficiently manipulating and processing large data sets using NumPy arrays and data visualization with 2D plots using Matplotlib.
- Next up is an introduction to Pandas to efficiently load, clean, normalize, aggregate, transform, and visualize data.
Days 4–5: Machine Learning with scikit-learn, and Data Visualization
- Use specific regression, classification, and clustering algorithms skillfully to model data and solve problems by leveraging the full power of the scikit-learn API
- Extract relevant information from images using scikit-image
- Extract lexical and semantic information from natural language data
- Engineer numeric features to maximize predictive power
- Visualize interactions and non-linear distributions of data using matplotlib and seaborn
- Validate models with the appropriate success metrics
- Troubleshoot common issues like unbalanced labels and high dimensionality data
- Build deep insight by retrieving model parameters
If you registered to attend this course online, the session times will be sent to you one week before your program start date. The course will be held on GoToMeeting.
Onsite corporate classes are also available. Discounts are available for 3+ attendees and academics currently at a degree-granting institution. Contact us using the form on this page to learn more.
Note: The 3 day Machine Learning Mastery Workshop is an alternative course for those who already have both (1) current working knowledge of programming in the Python standard language (data structures, control flow, assignment, functions, and package access) and (2) familiarity with array programming in NumPy.
Course Syllabus & Topics
Due to social distancing measures currently in place to slow the spread of COVID-19, we will be teaching this course online, in real-time on GoToMeeting, with an Enthought trainer. The content and prerequisites for the virtual course do not differ from the face-to-face program.
Experience with Python is helpful (but not required). However, programming experience in some language (such as R, MATLAB, SAS, Mathematica, Java, C, C++, VB, or FORTRAN) is expected. In particular, participants need to be comfortable with general programming concepts like variables, loops, and functions.
1. Introduction to Python
We kick off the class by exploring the functionality of the IPython Shell, an enhanced interactive science-centric console. Next we review the Jupyter Notebook, a cell-based environment that renders scripts, plots, and rich media in a web-like interface, making it ideal for sharing and publishing analysis with peers. You’ll leave with a mastery of these tools that will accelerate your productivity and facilitate collaboration.
- Data-Types (strings, lists, dictionaries and more)
- Control Flow (if-then statements, looping)
- Organizing code (functions, modules, packages)
- Reading and writing files
2. Introduction to NumPy and 2D plotting
- Plotting with matplotlib
- Understanding the N-dimensional data structure
- Creating arrays
- Indexing arrays by slicing or more generally with indices or masks
- Basic operations and manipulations on N-dimensional arrays
3. Time series analysis and data manipulation with Pandas
Built on top of NumPy arrays, the Python Data Analysis Library (Pandas) is a powerful and convenient package for dealing with multi-dimensional datasets. Participants will learn about its powerful data aggregation and reorganization capabilities for data set explorations, including support for labeling data along each dimension, missing values, and time series manipulations.
- Pandas I/O operations
- Pandas 1D and 2D data structures (Series and DataFrame)
- Data alignment, aggregation, and summarization
- Computation and analysis with Pandas
- Dealing with dates and times
Visual Exploration with seaborn and matplotlib
- Inspect feature distributions before applying transformations
- Spot correlations, non-linearities, and level combinations between features
- Identify interactions between features using faceted plots
Intro to Machine Learning With scikit-learn
- Linear and nonlinear models
- Constant and variable learning-rates
- Cost functions, regularization methods, and other constraints
- Fitting, transforming, and predicting
Working with Numeric Data
- Logarithmic and curvilinear transforms
- Data scaling
- Linear regressors
- l1 and l2 normalization
- Support vector machines (SVM)
Working with Categorical Data
- Contrast encoding
- Missing values
- Categorical rebinning
- Linear classifiers
- Tree-based classifiers
- Ensemble methods
- Boosting methods
- Unbalanced designs
Working with Image Data
- Image storage formats
- Smoothing and denoising
- Edge detection
- Feature-based segmentation
- K-means clustering