ML4SocialScience

Class notes for University of Pennsylvania course on machine learning for social science (CRIM6012/SOCI6012)


Project maintained by gregridgeway Hosted on GitHub Pages — Theme by mattgraham

ML4SocialScience

These notes are best viewed at the ML4SocialScience github.io site.

These are the class notes for my course on machine learning for social science (CRIM6012/SOCI6012) that I have taught at the University of Pennsylvania since 2024. The course aims to

Table of contents

  1. Probability review
  2. Naïve Bayes classifier
  3. Prediction, bias, variance, and noise
    • k-nearest neighbor regression and classification
    • Example: Predict dropout risk from the NELS88 data
    • Spam example
  4. Differential calculus review
  5. Classification and regression trees
  6. Linear algebra
    • Basic matrix operations, including matrix derivatives
    • Ordinary least squares and ridge regression
    • Multivariate Taylor series, Newton-Raphson, logistic regression, iteratively reweighted least squares (IRLS)
  7. Singular value decomposition
    • Image compression
    • Image classification with emojis
  8. Boosting and L1 regularization
    • Lasso
    • Forward stagewise selection
    • Gradient boosting
  9. Propensity score estimation
    • Simpson’s paradox and confounders
    • Neyman-Rubin causal model
    • Propensity score weighting
      • using machine learning to estimate propensity scores
      • fastDR package
  10. Neural networks
    • Backpropagation “by hand”
    • neuralnet package
    • Tensorflow and Keras
    • Convolutional layers
    • MNIST postal digits dataset
  11. Text analysis
    • Working with text2vec
    • DTM and TFIDF
    • SVD for text
  12. Long short-term memory (LSTM) neural networks
    • LSTM models
    • a small language model