Python for Machine Learning

About This Course

Artificial intelligence and machine learning are defining features of the 21st century and are quickly becoming a key factor in gaining and maintaining competitive advantage in each industry which incorporates them. 

This 5-day class provides the skills needed by scientists, engineers, data scientists, data analysts, and business intelligence experts to use Python and machine learning for their data mining, classification, and predictive modeling tasks. This highly interactive training will empower your team with the skills they need to build reliable, repeatable analyses and prediction workflows. After this class, they will be able to significantly increase the amount of data they can process, thanks to automation, and speed up the classification, interpretation, and analysis of data.

Course Overview

In this course we combine conceptual knowledge of machine learning with extensive experience applying it to real-world data. Your team will develop skills in applying Python’s machine learning tools, such as the scikit-learn package, to make predictions about complicated phenomena by leveraging the information contained in numerical data, natural language, images, and discrete categories.

The emphasis is on learning techniques to maximize the predictive performance of machine learning workflows. After building a solid foundation in the Python scientific stack, we focus on the different types of feature sources for machine learning. For each, we progress through a short introductory lecture followed by exercises of progressive difficulty. Intermingled with the machine learning material are short discussions of helpful and diagnostic data visualizations.

At the end of this course, participants will be able to:
  • Set up an integrated analysis environment for building reliable and repeatable machine learning workflows with Python
  • Write idiomatic programs to access the various data science tools, document, and automate analytic processes using the Python standard library
  • Concisely express high performance computations and numerical analyses using NumPy
  • Effectively load, clean, normalize, aggregate, transform, and visualize data using Pandas
  • Use specific regression, classification, and clustering algorithms skillfully to model data and solve problems by leveraging the full power of the scikit-learn API
  • Extract relevant information from images using scikit-image
  • Extract lexical and semantic information from natural language data
  • Engineer numeric features to maximize predictive power
  • Visualize interactions and non-linear distributions of data using matplotlib and seaborn
  • Validate models with the appropriate success metrics
  • Troubleshoot common issues like unbalanced labels and high dimensionality data
  • Build deep insight by retrieving model parameters

Contact Us

Questions or need help registering? Call us at 512.536.1057 or fill out the form:

Machine Learning Cheat Sheets Download

Course Instructors

Enthought instructors have doctorates in scientific fields such as physics, engineering, computer science, and mathematics, and all have extensive experience through research and consulting in applying Python to solve complex problems across a range of industries, allowing them to bring their real world experience to the classroom every day. Enthought instructors possess professional, first-hand experience with the tools and technologies covered in our courses.

Course Syllabus & Topics

Course Prerequisites

Experience with Python is helpful (but not required). However, programming experience in some language (such as R, MATLAB, SAS, Mathematica, Java, C, C++, VB, or FORTRAN) is expected. In particular, participants need to be comfortable with general programming concepts like variables, loops, and functions.

Python Infrastructure and Development Tools

We kick off the class by exploring the functionality of the IPython Shell, an enhanced interactive science-centric console. Next we review the Jupyter Notebook, a web-based application that mixes code, plots, and rich media, making it ideal for sharing and publishing analyses with peers. You’ll leave with a mastery of tools that will accelerate your productivity and facilitate collaboration.

Building a Solid Infrastructure to Go From Exploratory Analysis to Reproducible Workflows

  • Canopy: Integrated Analysis Environment
  • IPython Shell
  • Custom environment settings
  • Jupyter (IPython) Notebooks
  • Script editor
Python Language Essentials

Next we move into an introduction to Python’s core language features that form part of your universal toolkit for tasks ranging from initial data exploration to extensible application development. We’ll introduce Python’s built-in data structures, including how and where each might be used and what trade-offs are present, and we’ll cover Python’s looping and control flow constructs. Along the way we’ll provide insight into Python’s design choices that will help you understand why Python works the way it does.

  • Fundamental data types and data structures
  • Organizing code with functions, modules and packages
  • Loading packages, namespaces
  • Reading and writing data
  • Control flow
Numerical Analysis and Data Exploration with NumPy Arrays, and Data Visualization with Matplotlib

NumPy is a tool for rapidly manipulating and processing large data sets. Whether you have a team of scientists writing scripts to analyze and plot analytical results or analysts writing large-scale quantitative finance applications for Wall Street, NumPy is a critical tool.

Then, we use Matplotlib, a versatile 2D plotting library, to generate publication-quality with just a few lines of code.

  • The NumPy array
  • Selecting data using slicing and logical indexing
  • Efficient numerical processing with multi-dimensional arrays
  • Expressive array operations and manipulations
  • Access larger-than-RAM data using memory mapped arrays
  • 2D plotting with Matplotlib: line plots, scatter plots, histograms, labeling, and more.
Accessing, Preparing, and Exploring Data with Pandas

We do a deep dive into the Python Data Analysis Library (Pandas), a powerful package for working with tabular data. Pandas’ powerful data aggregation and reorganization capabilities, including support for labeling data along each dimension, missing values, and time series manipulations, have made Python an indispensable tool for data exploration and analysis.

  • Loading from CSV and other structured text formats
  • Accessing data stored in SQL databases
  • 1D and 2D data structures: Series and DataFrame
  • Stripping out extraneous information
  • Normalizing data
  • Dealing with missing data
  • Data manipulation (alignment, aggregation, and summarization)
  • Group-based operations: split-apply-combine
  • Statistical analysis
  • Date and time series analysis with Pandas
  • Visualizing data
Introduction to Machine Learning

We start with a short conceptual introduction to machine learning. We demystify what it’s all about, explain how it works, and what kinds of problems it’s best suited to solve. Next, we cover the frameworks and tools provided by scikit-learn, a widely used library for machine learning. Then, we focus on the best practices for extracting useful information from various features sources.

  • Linear and nonlinear models
  • Constant and variable learning-rates
  • Cost functions, regularization methods, and other constraints
  • Fitting, transforming, and predicting
Machine Learning: Numeric Data
  • Logarithmic and curvilinear transforms
  • Data scaling
  • Outliers
  • Linear regressors
  • l1 and l2 normalization
  • Support vector machines (SVM)
Machine Learning: Categorical Data
  • Contrast encoding
  • Missing values
  • Categorical rebinning
  • Linear classifiers
  • Tree-based classifiers
  • Ensemble methods
  • Boosting methods
  • Unbalanced designs
Machine Learning: Image Data
  • Image storage formats
  • Scikit-image
  • Smoothing and denoising
  • Edge detection
  • Feature-based segmentation
  • K-means clustering

Open Class Schedule

Onsite corporate classes are also available. Discounts are available for 3+ attendees and academics currently at a degree-granting institution. Contact us with the form to the right to learn more.

WhereWhenPrice (per person)Register
Contact us with the form to the right to request a private onsite class, or an open class in your area.

Note: The 3 day Machine Learning Mastery Workshop is an alternative course for those who already have both (1) current working knowledge of programming in the Python standard language (data structures, control flow, assignment, functions, and package access) and (2) familiarity with array programming in NumPy.

Contact Us

Questions or need help registering? Call us 512.536.1057 or fill out the form:

FAQs

  • Do I need to have taken a class from Enthought before to enroll in Python for Machine Learning?
    • No, however, programming experience in some language (such as R, MATLAB, SAS, Mathematica, Java, C, C++, VB, or FORTRAN) is expected. In particular, participants need to be comfortable with general programming concepts like variables, loops, and functions. Experience with Python is helpful (but not required).
  • Is Deep Learning (with Keras, TensorFlow, or PyTorch) covered in the course?
    • Not in this particular course, no. Deep learning is a very exciting and promising field of research, but one which requires specialized hardware and whose use cases are relatively limited. This course covers learning algorithms that are both broadly applicable and also usable on common workstations and laptops.
  • What’s the difference between Enthought’s Python for Machine Learning, Python for Data Science, and the Machine Learning Mastery Workshop?
    • Python for Machine Learning and Python for Data Science are both five-day classes designed to introduce Python, NumPy, Pandas, Matplotlib, seaborn and scikit-learn.
      • Where they differ:
        • Python for Machine Learning includes image processing and is focused on feature engineering. It is better suited for people new to machine learning.
        • Python for Data Science includes database access and is focused on machine learning algorithms. It is better suited for people who already know machine learning and want to learn Python. One previous attendee called it “the most concise data science primer you can find.”
        • The Machine Learning Mastery Workshop is a three-day long workshop-style course for Python programmers already proficient with NumPy and Pandas, which means that it largely consists of guided, hands-on practice applying machine learning algorithms to real data. As opposed to a primer, the Mastery Workshop is more of a “deep-dive.”

*each box represents ~1 day of content

  • What are the prerequisites for this course?
    • Experience with Python is helpful (but not required). However, programming experience in some language (such as R, MATLAB, SAS, Mathematica, Java, C, C++, VB, or FORTRAN) is expected. In particular, participants need to be comfortable with general programming concepts like variables, loops, and functions.
  • I am worried that your training is only useful to people who are committed to using Enthought software products. How much of your training is usable without Enthought software?
    • 100%. Our training teaches students how to write software with Python and solve problems using its scientific packages, not how to use proprietary software. Everything you will learn uses free and open source software. We provide Enthought Canopy (our integrated analysis environment and Python distribution) to training participants to ensure they have all of the tools and Python packages they need to complete the training and that the tools are as easy as possible to install. While participants sometimes do use other editors, package managers, and Python distributions, we strongly recommend participants use Canopy during the training. With Canopy we can ensure that you can easily install everything you need for the course out of the box and we can provide technical support (which we unfortunately cannot provide for other tool sets).
  • I use / will be using Anaconda Python. Will I still benefit from this course?
    • Absolutely. Our training materials work with any Python distribution (such as Anaconda), as long as you also have all of the necessary packages, a text or code editor, package manager, interactive IPython shell, and Jupyter notebooks installed.
  • Is a class completion certificate provided?
    • Yes, a class completion certificate is provided for the five-day Python for Machine Learning class.

Have a question that isn’t answered here? Contact us or call 512.536.1057.