About This Course
This 5-day class combines our Python Foundations with our project-based Pandas Mastery Workshop. The curriculum provides an excellent introduction to the Python language and its capabilities for all things data, while also providing intensive exposure to the core workhorse tools of NumPy and Pandas that are central to data analysis in Python.
This class is perfect for people who want to start using Python and Pandas regularly in their day-to-day work and need to achieve a high level of proficiency rapidly. With a hands-on, exercise-intensive design and individualized instructor coaching, students will leave this class with the capability to immediately transfer their learnings to their day-to-day work.
Pandas (the Python Data Analysis library) provides a powerful and comprehensive toolset for working with data, including tools for reading and writing diverse files, data cleaning and wrangling, analysis and modeling, and visualization. Fields with widespread use of Pandas include data science, finance, neuroscience, economics, advertising, web analytics, statistics, social science, and many areas of engineering. Quantitative analysts, data scientists, and business analysts will find this class particularly beneficial.
Days 1–2: Python and NumPy
- It begins with a one-day introduction to the Python language focusing on standard data structures, control constructs, and code organization.
- After a brief overview of the Scientific Python ecosystem, we dive into techniques for numeric data processing, including efficiently manipulating and processing large data sets using NumPy arrays and data visualization with 2D plots using Matplotlib.
Days 3–5: Pandas Mastery Workshop materials
The class progresses step-by-step through a repeatable data analysis workflow using the Python Pandas library, including reading in data from multiple sources and databases, cleaning, merging, and munging data to prepare it for analysis, and data exploration and visualization.
Topics covered include
- Accessing Data From Multiple Sources
- Cleaning and Preparing Data
- Database Access and Data Wrangling
- Data Visualization
- Data Analysis
- Real-World Modeling and Problem Solving
"You could tell from the demos, examples and exercises that this course was designed and taught by someone who has first hand experience of using the tools on real world and real life data."
Onsite corporate classes are also available. Discounts are available for 3 or more attendees and academics currently at a degree-granting institution. Contact us using the form on this page to learn more.
The 3 day Pandas Mastery Workshop is an alternative course for those who already have both (1) current working knowledge of programming in the Python standard language (data structures, control flow, assignment, functions, and package access) and (2) familiarity with array programming in NumPy.
There are no classes scheduled at this time. To request one, please contact us using the form on this page.
Course Syllabus & Topics
Programming experience in some language (such as R, MATLAB, SAS, Mathematica, Java, C, C++, VB, or FORTRAN) is expected. In particular, participants need to be comfortable with general programming concepts like variables, loops, and functions. Previous Python experience is helpful, but not required.
An understanding of how to use the Python standard library to write programs, access various tools, and document and automate analytical processes.
- Types (strings, lists, dictionaries, and more)
- Control Flow (if-then statements, looping)
- Organizing code (functions, modules, packages)
- Reading and writing files
- Overview of Object-Oriented Programming (OOP)
Introduction to NumPy and 2D plotting. The NumPy package is presented as a tool for rapidly manipulating and processing large data sets. 2D plotting is introduced with matplotlib.
- Understanding the N-dimensional data structure
- Creating arrays
- Indexing arrays by slicing or more generally with indices or masks
- Basic operations and manipulations on N-dimensional arrays
- Plotting with matplotlib
Built on top of NumPy arrays, the Python Data Analysis Library (Pandas) is a powerful and convenient package for dealing with tabular datasets. Participants will learn about its powerful data aggregation and reorganization capabilities for data set explorations, including support for labeling data along each dimension, dealing with missing values, and time series manipulations.
An expert instructor will support students as they work through a typical real-world data analysis project step-by-step using Pandas. This course develops the deep knowledge and skills that will enable students to tackle their own projects with Pandas immediately when they get back to work on Monday morning.
- Reading and writing data from local files (.txt,.csv,.xls, .json, etc)
- Reading data from remote files
- Scraping tables from web pages (.html)
- Making the most of the powerful read_table method
- Working with Pandas data structures: Series and DataFrames
- Accessing your data: indexing, slicing, fancy indexing, boolean indexing
- Data wrangling, including dealing with dates and times and missing datas
- Adding, dropping, selecting, creating, and combining rows and columns
- Database access with DB-API2 and SQLAlchemy
- Executing SQL commands from Pandas
- Loading database data into a DataFrame
- Combining and manipulating DataFrames: merge, join, concatenate
- Understanding the structure of a Figure
- Data visualization: scatter plots, line plots, box plots, bar charts,and histograms with matplotlib
- Customizing plots: important attributes and arguments
- Split-apply-combine with DataFrames
- Data summarization and aggregation methods
- Pandas powerful groupby method
- Reshaping, pivoting, and transforming your data
- Simple and rolling statistics
- Deep learning of the data analysis tools through lectures, Q&A, and hands-on exercises
- Develop transferable skills through application to authentic data sets
- Predict the future with time series analysis
- And more!