XICA Logo           Mumbai Logo

Python for Data Science

Module 1: Introduction to Python

May 11, 2020

Module 2: Loops in Python

May 13, 2020

Module 3: Functions in Python

May 15, 2020

Module 4: Lists and Tuples in Python

May 18, 2020

Module 5: Strings and String Processing

May 20, 2020

Module 6: Dictionaries and Memoization

May 22, 2020

Module 7: Comprehension

May 25, 2020

Module 8: File Processing and Data Cleaning

May 27, 2020

Module 9: Practicum: Denver Car Accidents Statistics

May 29, 2020. Please download the Denver County crime data (112 MB). You will need the comma separated file and might also find the explanations of the offense codes.

Module 10: Exceptions

June 1, 2020

Module 11: Object Oriented Programming 1

June 3, 2020

Module 12: Simple Classification: Decision Trees

June 5, 2020

Homework: Use the Decision tree technique to develop a decision tree either for the blood donation data set or the Pima Indian diabetes data set. You can use Gini or entropy in order to solve this. If you want to test the accuracy of your model, you can split the dataset randomly into ~80% records for training and ~20% for testing.

Module 13: Object Oriented Programming 2

June 8, 2020

doc-strings, Address example with internationalization, k-nearest-neighbor implementation

Module 14: Object Oriented Programming 3

June 10, 2020

Homework: Implement a full implementation for the class Gaussian. A Gaussian is a complex number where real and imaginary part are integers. You need to implement:

  1. Initializer, string, and repr
  2. hash function (needs to return an integer)
  3. abs (__abs__)
  4. equality
  5. addition, subtraction, multiplication, exponentiation (__pow__), and division
  6. multiplication with a scalar, you do this using __rmul__

Use rounding to insure that the results of an operation is again a Gaussian and not a complex number. Otherwise, the operations are just defined as for complex numbers.

Module 15: Numpy 1

Module 16: Numpy 2

Module 17: Minimization and Curve Fitting with SciPy

Homework Week 6

Module 18: Pandas

Module 19: Web Scraping

Module 20: Statistics

Module 21: Visualization with Pandas


Module 22: Visualization 2

Module 23: Forecasting and Time Series

Module 24: Linear Regression

Homework Week 8

Module 25: Forecasting and Time Series 2

Module 26: Visualization again

Module 27: Naive Bayesian Inference


Module 28: Logistic Regression

Module 29: Support Vector Machines

Module 30: Principal Component Analysis

Final Homework

Use principal component analysis on the Iris data set and display it in two dimensions.