The topics are divided into roughly three units, as outlined below. The pace is roughly 1 or 2 topics per week and 1 graded item (a lab notebook or exam) per week. The detailed schedule will be made available during the first week of class.
Module 0: Fundamentals.
- Topic 0: Overview + intro to Jupyter
- Topic 1: Python bootcamp review
- Topic 2: Pairwise associatoin mining
- default dictionaries, asymptotic running time
- Topic 3: Mathematical preliminaries
- probability, calculus, linear algebra
- Topic 4: Representing numbers
- floating-point arithmetic, numerical analysis
Module 1: Representing, transforming, and visualizing data.
- Topic 5: Preprocessing unstructured data
- Strings and regular expressions
- Topic 6: Mining the web
- (Notebook only) HTML processing, web APIs
- Topic 7: Tidying data
- Pandas, merge/join, tibbles and bits, melting and casting
- Topic 8: Visualizing data and results
- Seaborn, Bokeh
- Topic 9: Relational data (SQL)
Module 2: The analysis of data.
- Topic 10: Intro to numerical computing
- NumPy / SciPy
- Topic 11: Ranking relational objects
- Graphs as (sparse) matrices, PageRank
- Topic 12: Linear regression
- Direct (e.g., QR) and online (e.g., LMS) methods
- Topic 13: Classification
- Logistic regression, numerical optimization
- Topic 14: Clustering
- The k-means algorithm
- Topic 15: Compression
- Principal components analysis (PCA), singular value decomposition (SVD)
- Topic 16: Putting it all together
- (Notebook only) Eigenfaces