About This Course

This course is now taught virtually, with classes led online by an Enthought trainer in real-time on GoToMeeting.

We endeavour to deliver these virtual programs as we would a face-to-face program. Interaction with the trainer is encouraged.

This 5-day class combines our 3-day Python Foundations with essential materials on machine learning and data visualization. It provides the skills needed by scientists, engineers, data scientists, data analysts, and business intelligence experts to use Python and machine learning for their data mining, classification, and predictive modeling tasks. This highly interactive training will empower your team with the skills they need to build reliable, repeatable analyses, and prediction workflows. After this class, they will be able to significantly increase the amount of data they can process, thanks to automation, and speed up the classification, interpretation, and analysis of data.

thumbnail

Course Overview

Artificial intelligence and machine learning are defining features of the 21st century and are quickly becoming a key factor in gaining and maintaining competitive advantage in each industry which incorporates them.

In this course, we combine conceptual knowledge of machine learning with extensive experience applying it to real-world data. Your team will develop skills in applying Python’s machine learning tools, such as the scikit-learn package, to make predictions about complicated phenomena by leveraging the information contained in numerical data, natural language, images, and discrete categories.

The emphasis is on learning techniques to maximize the predictive performance of machine learning workflows. After building a solid foundation in the Python scientific stack, we focus on the different types of feature sources for machine learning. For each, we progress through a short introductory lecture followed by exercises of progressive difficulty. Intermingled with the machine learning material are short discussions of helpful and diagnostic data visualizations.

Days 1–3: Python Foundations

  • It begins with a one-day introduction to the Python language focusing on standard data structures, control constructs, and code organization.
  • After a brief overview of the Scientific Python ecosystem, we dive into techniques for numeric data processing, including efficiently manipulating and processing large data sets using NumPy arrays and data visualization with 2D plots using Matplotlib.
  • Next up is an introduction to Pandas to efficiently load, clean, normalize, aggregate, transform, and visualize data.

Days 4–5: Machine Learning with scikit-learn, and Data Visualization

  • Use specific regression, classification, and clustering algorithms skillfully to model data and solve problems by leveraging the full power of the scikit-learn API
  • Extract relevant information from images using scikit-image
  • Extract lexical and semantic information from natural language data
  • Engineer numeric features to maximize predictive power
  • Visualize interactions and non-linear distributions of data using matplotlib and seaborn
  • Validate models with the appropriate success metrics
  • Troubleshoot common issues like unbalanced labels and high dimensionality data
  • Build deep insight by retrieving model parameters

View Course FAQs

 

Class Schedule

If you registered to attend this course online, the session times will be sent to you one week before your program start date. The course will be held on GoToMeeting.

Onsite corporate classes are also available. Discounts are available for 3+ attendees and academics currently at a degree-granting institution. Contact us using the form on this page to learn more.

Note: The 3 day Machine Learning Mastery Workshop is an alternative course for those who already have both (1) current working knowledge of programming in the Python standard language (data structures, control flow, assignment, functions, and package access) and (2) familiarity with array programming in NumPy.

Where When Price (per person) Reserve a Seat
Online - Live Virtual October 5-9, 2020 | 8:30AM - 5:00PM MDT $2200 Register Online
Online - Live Virtual Nov. 30 - Dec. 4, 2020 | 8:30AM - 5:00PM MDT $2200 Register Online

Contact Us

Questions or need help registering? Call us at 512.536.1057 or fill out the form:

Course Syllabus & Topics

Due to social distancing measures currently in place to slow the spread of COVID-19, we will be teaching this course online, in real-time on GoToMeeting, with an Enthought trainer. The content and prerequisites for the virtual course do not differ from the face-to-face program. 

Course Prerequisites

Experience with Python is helpful (but not required). However, programming experience in some language (such as R, MATLAB, SAS, Mathematica, Java, C, C++, VB, or FORTRAN) is expected. In particular, participants need to be comfortable with general programming concepts like variables, loops, and functions.

Collapse All

1. Introduction to Python

We kick off the class by exploring the functionality of the IPython Shell, an enhanced interactive science-centric console. Next we review the Jupyter Notebook, a cell-based environment that renders scripts, plots, and rich media in a web-like interface, making it ideal for sharing and publishing analysis with peers. You’ll leave with a mastery of these tools that will accelerate your productivity and facilitate collaboration.

  • Data-Types (strings, lists, dictionaries and more)
  • Control Flow (if-then statements, looping)
  • Organizing code (functions, modules, packages)
  • Reading and writing files

2. Introduction to NumPy and 2D plotting

The NumPy package is presented as a tool for rapidly manipulating and processing large data sets. 2D plotting is introduced with matplotlib.

  • Plotting with matplotlib
  • Understanding the N-dimensional data structure
  • Creating arrays
  • Indexing arrays by slicing or more generally with indices or masks
  • Basic operations and manipulations on N-dimensional arrays

3. Time series analysis and data manipulation with Pandas

Built on top of NumPy arrays, the Python Data Analysis Library (Pandas) is a powerful and convenient package for dealing with multi-dimensional datasets. Participants will learn about its powerful data aggregation and reorganization capabilities for data set explorations, including support for labeling data along each dimension, missing values, and time series manipulations.

  • Pandas I/O operations
  • Pandas 1D and 2D data structures (Series and DataFrame)
  • Data alignment, aggregation, and summarization
  • Computation and analysis with Pandas
  • Dealing with dates and times
  • Visualization

Visual Exploration with seaborn and matplotlib

  • Inspect feature distributions before applying transformations
  • Spot correlations, non-linearities, and level combinations between features
  • Identify interactions between features using faceted plots

Intro to Machine Learning With scikit-learn

  • Linear and nonlinear models
  • Constant and variable learning-rates
  • Cost functions, regularization methods, and other constraints
  • Fitting, transforming, and predicting

Working with Numeric Data

  • Logarithmic and curvilinear transforms
  • Data scaling
  • Outliers
  • Linear regressors
  • l1 and l2 normalization
  • Support vector machines (SVM)

Working with Categorical Data

  • Contrast encoding
  • Missing values
  • Categorical rebinning
  • Linear classifiers
  • Tree-based classifiers
  • Ensemble methods
  • Boosting methods
  • Unbalanced designs

Working with Image Data

  • Image storage formats
  • Scikit-image
  • Smoothing and denoising
  • Edge detection
  • Feature-based segmentation
  • K-means clustering