XICA Logo           Mumbai Logo           Iteso Logo           Uca Logo           Puebla Logo           Mexico Logo           Leon Logo           Javeriana Logo

Python for Data Science


The purpose of this course is to introduce students with previous IT experience quickly to the basics of Python and to the extensions of Python useful for Data Science and Machine Learning. Because the class covers much ground, I decided to not include neural networks, but in my experience, learning and using Keras should not be difficult. Because of the wealth of the material, it is necessary that students consult the instructor whenever they have doubts. I will provide small projects that are best done in groups, even though of course in the middle of a pandemic, working conditions will not be simple.

Because the material already developed is in English, the course will be conducted in English, but interactions can be in Spanish.

Nota bene: The materials are going to be replaced successively as course contents are adjusted. In particular, the presentation videos are going to be replaced. The old ones are from a class given in the Summer.

Times and Contact Information

This class will be given via zoom on Tuesdays and Thursdays at 19:00 (7pm) Chicago time. This is roughly the same as Mexico City time. Class duration is between 1:30 and 1:50 hours. I'll try to use zoom security features to deter Zoom pirates. You can reach me at tschwarz at calprov dot org.

Meeting ID: 884 9638 4434
Passcode: 1946

Erik Rene Bojorges Valdez is graciously opening up his google meet to all students, from Monday 18:00 to 20:00 Mexico time. I understand that he is willing to help all who need help and I am very thankful for that.

Module 1: Introduction to Python

August 18, 2020

Module 2: Loops in Python

August 20, 2020

Module 3: Functions in Python

August 25, 2020

Module 4: Lists and Tuples in Python

August 27, 2020

Module 5: Strings and String Processing

September 1, 2020

Module 6: Dictionaries and Memoization

September 3, 2020

Module 7: Comprehension

September 8, 2020

Module 8: File Processing and Data Cleaning

September 10, 2020

Module 9: Practicum: Denver Car Accidents Statistics

September 15, 2020. Please download the Denver County crime data (112 MB). You will need the comma separated file and might also find the explanations of the offense codes.

Module 10: Exceptions

September 17, 2020

Module 11: Object Oriented Programming 1

September 22, 2020

Module 12: Simple Classification: Decision Trees

September 24, 2020

Homework: Use the Decision tree technique to develop a decision tree either for the blood donation data set or the Pima Indian diabetes data set. You can use Gini or entropy in order to solve this. If you want to test the accuracy of your model, you can split the dataset randomly into ~80% records for training and ~20% for testing.

Module 13: Object Oriented Programming 2

September 29, 2020

doc-strings, Address example with internationalization, k-nearest-neighbor implementation

Module 14: Object Oriented Programming 3

October 1, 2020

Homework: Implement a full implementation for the class Gaussian. A Gaussian is a complex number where real and imaginary part are integers. You need to implement:

  1. Initializer, string, and repr
  2. hash function (needs to return an integer)
  3. abs (__abs__)
  4. equality
  5. addition, subtraction, multiplication, exponentiation (__pow__), and division
  6. multiplication with a scalar, you do this using __rmul__

Use rounding to insure that the results of an operation is again a Gaussian and not a complex number. Otherwise, the operations are just defined as for complex numbers.

Module 15: Numpy 1

October 6, 2020

Module 16: Numpy 2

October 8, 2020

Module 17: Minimization and Curve Fitting with SciPy

October 13, 2020

Module 18: Pandas

October 15, 2020

Module 19: Web Scraping

October 20, 2020

Module 20: Statistics

October 22, 2020

Module 21: Visualization with Pandas

October 27, 2020


Module 22: Visualization 2

October 29, 2020

Module 23: Forecasting and Time Series

November 3, 2020

Module 24: Linear Regression

November 5, 2020

Homework Week 8

Module 25: Forecasting and Time Series 2

November 10, 2020

Module 26: Visualization again

November 12, 2020

Homework Week 9

Module 27: Naive Bayesian Inference

November 17, 2020


Module 28: Logistic Regression

November 19, 2020

Module 29: Support Vector Machines

November 24, 2020

Module 30: Principal Component Analysis

November 26, 2020

Final Homework

Use principal component analysis on the Iris data set and display it in two dimensions.