Python for Data Science

Data Science with Python Training, using scikit-learn for machine learning

Course Overview

This fast-paced class is intended for practicing data scientists, data analysts, and business intelligence experts interested in using Python for their day-to-day work. The primary focus is on learning to use Python tools for data science, data analysis, and machine learning efficiently and effectively.

What You'll Learn

Participants in this course will take away:

  • Hands-on experience setting up an integrated analysis environment for doing data science with Python.
  • An understanding of how to use the Python standard library to write programs, access the various data science tools, and document and automate analytic processes.
  • Orientation to some of the most powerful and popular Python libraries for data science including Pandas (data preparation, analysis, and modeling; time series analysis), scipy.stats (statistics), scikit-learn (machine learning), and Matplotlib (data visualization).
  • Working knowledge of the Python tools ideally suited for data science tasks, including:
    • Accessing data (e.g., text files, databases)
    • Cleansing and normalizing data
    • Exploring data (e.g., simple statistics, correlation matrices, visualization)
    • Modeling data (e.g., statistics, machine learning)

Course Syllabus

I. Introduction and Setting Up Your Integrated Analysis Environment

Setting Up Your Integrated Analysis Environment & Tools Overview

  • IPython Shell
  • Custom environment settings
  • Jupyter Notebooks
  • Script editor
  • Packages: NumPy, SciPy, scikit-learn, Pandas, Matplotlib, etc.
  • Training on Demand

Once you complete this module, you will understand some of the unique benefits of using Python for data science / what features make Python particularly well-suited for data science, you will be able to set up a fully functioning Python-based analysis environment, and you will know what each tool is used for in the data science workflow.

II. Using Python to Control and Document Your Data Science Processes

Python Essentials

  • Data types and objects
  • Loading packages, namespaces
  • Reading and writing data
  • Simple plotting
  • Control flow
  • Debugging
  • Code profiling

Once you complete this module, you will be able to use the Python standard library plus Canopy tools to write, run, debug, and profile programs that control your data science processes (which draw on the scientific packages).

III. Accessing and Preparing Data

Data, Data, Everywhere...

Acquiring Data with Python

  • Data types and objects
  • Loading packages, namespaces
  • Reading and writing data
  • Simple plotting
  • Control flow
  • Debugging
  • Code profiling

Cleansing Data with Python

  • Stripping out extraneous information
  • Normalizing data
  • Formatting data

Once you complete this module, you will know how to load data from common types of data sources, including structured text files and SQL databases. and you will know some of the common tools used in Python to cleanse and prepare your data for analysis.

IV. Numerical Analysis, Data Exploration, and Data Visualization with NumPy Arrays & Matplotlib

NumPy Essentials

  • The NumPy array
  • 2D plotting with Matplotlib
  • N-dimensional array operations and manipulations
  • Memory mapped files

Once you complete this module, you will understand how to use NumPy arrays for efficient numerical processing and how to use NumPy methods such as slicing to write code that is both compact and easy to read and understand. You will know how to use Matplotlib and NumPy together to explore and visualize your data.

V. Exploring Data with Pandas and scipy.stats

Searching for Gold in a Pile of Pyrite

  • Data manipulation with Pandas
  • Statistical analysis with Pandas
  • Time series analysis with Pandas
  • Overview of statistical tools in scipy.stats

At the end of this module, you will know how to access some of the core tools used for statistical analysis and data exploration in Python.

VI. Machine Learning with scikit-learn

Predicting the Future Can Be Good for Business

  • Input: 2D, samples, and features
  • Estimator, predictor, transformer interfaces
  • Pre-processing data
  • Regression
  • Classification
  • Model selection

At the end of this module you will have a working understanding of what machine learning tools are available in scikit-learn and how to use them.

Python for Data Science
Course Schedule

For inquiries or to register call 512.536.1057


Albuquerque, NM
May 15-19, 2017
$2750
Austin, TX
Jun 12-16, 2017
San Jose, CA
Jul 17-21, 2017

Discounts available for 3+ attendees; corporate training options are also available. Contact us or call 512.536.1057 for more information.

A 20% discount is available for academics at a degree-granting institution. Contact us at 512.536.1057 to register.

Prerequisites

The course assumes a working knowledge of key data science topics (statistics, machine learning, and general data analytic methods). Programming experience in some language (such as R, MATLAB, SAS, Mathematica, Java, C, C++, VB, or FORTRAN) is expected. In particular, participants need to be comfortable with general programming concepts like variables, loops, and functions. Experience with Python is helpful (but not required).

Python Training on Demand

Participants will receive 30 days of Enthought Training on Demand Python Foundations Series access as part of the course

Questions or want to reserve a seat in an upcoming class?

Call 512.536.1057 or fill out the
form below.