Topics and Schedule

August 2018 ยท 1 minute read

The topics are divided into roughly three units, as outlined below. The pace is roughly 1 or 2 topics per week and 1 graded item (a lab notebook or exam) per week. The detailed schedule will be made available during the first week of class.

Module 0: Fundamentals.

  • Topic 0: Overview + intro to Jupyter
  • Topic 1: Python bootcamp review
  • Topic 2: Pairwise associatoin mining
    • default dictionaries, asymptotic running time
  • Topic 3: Mathematical preliminaries
    • probability, calculus, linear algebra
  • Topic 4: Representing numbers
    • floating-point arithmetic, numerical analysis

Module 1: Representing, transforming, and visualizing data.

  • Topic 5: Preprocessing unstructured data
    • Strings and regular expressions
  • Topic 6: Mining the web
    • (Notebook only) HTML processing, web APIs
  • Topic 7: Tidying data
    • Pandas, merge/join, tibbles and bits, melting and casting
  • Topic 8: Visualizing data and results
    • Seaborn, Bokeh
  • Topic 9: Relational data (SQL)

Module 2: The analysis of data.

  • Topic 10: Intro to numerical computing
    • NumPy / SciPy
  • Topic 11: Ranking relational objects
    • Graphs as (sparse) matrices, PageRank
  • Topic 12: Linear regression
    • Direct (e.g., QR) and online (e.g., LMS) methods
  • Topic 13: Classification
    • Logistic regression, numerical optimization
  • Topic 14: Clustering
    • The k-means algorithm
  • Topic 15: Compression
    • Principal components analysis (PCA), singular value decomposition (SVD)
  • Topic 16: Putting it all together
    • (Notebook only) Eigenfaces