Introduction to Python for Statistics Short Course

Python
Steven Stehnach, Data Scientist
Saturday, January 25, 2020 (All day)
Middlebush Hall 142 (Middlebush/Johnson Auditorium), University of Missouri

Introduction to Python for Statistics Short Course

by Steven Stehnach, Data Scientist

Date: January 25, 2020 (Saturday), 8.30 am – 4 pm (with lunch break)

Location: Middlebush Hall 142 (Middlebush/Johnson Auditorium)

Due to overwhelming response and huge popularity the attendance cap is set at 150. Please sign up early if you are interested.

All course materials, software requirements, and installation guidelines will be provided on 01/20/2020 through e-mail to the registered participants.

​No refund will be issued after 01/15/2020, 4 pm.

In case of inclement weather on 01/25/2020, a snow day is reserved as 02/01/2020 with identical schedule.

Course Description:

The course will provide a gentle introduction to Python for statistical modeling.  Python is one of the world's most popular programming languages, and is commonly used by major companies for web development purposes. Google has used it for much of their web spider and search engine code, Yahoo built their "Groups" and "Maps" features with it, and the Central Intelligence Agency built their entire website with it. In addition to web development, Python can be found in commercial games, business software, educational applications, animation systems and many other different types of software.

Course topics will include:

·       Using a useful Python GUI (e.g., Spyder)

·       Basic introduction to reading, manipulating, and output data files

·       Simple data analysis (regression and classification methods)

·       Basic visualization (scatterplots, histograms, kernel smooths, regressions, maps)

·       Basic machine learning tools

The course will be “hands on” and participants are strongly encouraged to bring their laptops.  Instructions for recommended software and other course materials will be made available before the course. All recommended software will be open source, and can be downloaded free of cost.

Course Fee Structure:

University of Missouri Department of Statistics Students (undergrad and grad): 25$

University of Missouri Department of Statistics Faculty and Staff: $45

University of Missouri Students (non-Stat): $35

University of Missouri Faculty and Staff (non-Stat): $55

Non-University of Missouri Academic: $75

General Admission Non-Academic: $200

Coffee, snacks, and lunch is included in the course fee. 

Seats are limited so register as soon as possible. To register click here.

 

About the Course Presenter:

Steven Stehnach

Steven Stehnach

Steven Stehnach is a data scientist, who is enthusiastic about geography, marketing, sports, finance, and data applications in a variety of other fields. In May 2017, he completed a master's degree in statistics at the University of Missouri, where he focused on machine learning and predictive modeling. Steven is highly proficient in both R and Python, and has worked extensively in developing and implementing complex statistical models using R and Python in various practical areas.

Materials (To Be Distributed Beforehand through e-mail to all registered participants):

• Download Instructions (Screenshot step-by-step guide, released well ahead of time)

• Short Course Slides (Beamer Presentation; PDF)

• Python Modules (.py files)

• Toy Datasets (.csv, .txt files)

Topics To be Covered & Tentative Schedule:

Registration and Software Installation Help: 8:30 am – 9:30 am

  • No laptop will be given. Bring your own laptop. MAC or PC.
  • You are strongly encouraged to follow the emailed instructions and preinstall all required software and download the course materials in your personal laptops before you arrive.

Welcome Speech by Prof. Chris Wikle, Curators Distinguished Professor, Department Chair, Department of Statistics, University of Missouri: 9:15 am – 9.30 am

Session 1: Introduction to Python

9:30 am – 10.50 am

  • Overview 
  • Basic Commands, Import Modules 
  • Comparisons to R, Important Syntax Differences (Array Indices, etc.) 
  • Data Structures, Functions, etc.

20 Minute Break

 

Session 2: Data Exploration/Manipulation

11:10 am – 12.30 pm

  • Reading Data 
  • Data Summaries, Descriptive Statistics  
  • Data Processing w/ Pandas; Imputation, Feature Creation

Lunch (12:30 pm -1:30 pm)

 

Session 3: Data Visualization

1:30 pm – 2.15 pm

  • Matplotlib 
  • Seaborn 

Session 4: Regression with Statistical Models

2:15 pm – 2.40 pm

Standard regression model and analysis using python

20 Minute Break

 

Session 5: Regression & Machine Learning with Scikit-Learn

3:00 pm – 4.00 pm

Popular machine learning models using python