Structure and Interpretation of
Deep networks program themselves by finding patterns in data. But what patterns do they find, and what programs do they learn? Large-scale machine learning presents computer science researchers and practitioners with a new problem: how to understand, apply, and improve complex programs that were not designed explicitly by a human programmer.
In this IAP practicum, we introduce the problem of interpreting deep networks. We will not focus on training. Rather, we our topic is the art of how to understand a network after it is trained. We examine both local (single-case) and global (over-a-distribution) methods for understanding a network's behavior.
Every class will be a practicum, built around an interactive lab where you will learn to apply explanation methods using real models and data. Interpretable machine learning is a quickly evolving area, and each class will be built around recent papers, and will be led by researchers who are active in the area.
The course assumes familiarity with basic machine learning, neural networks, and python. An introductory machine learning course or a deep network course such as S.191 should be enough. You should have previous experience training models; this class is not mainly about learning to train models, but about understanding trained models.
1. Introduction: Why care how?
Does it matter whether we understand what a deep network is doing or not, if the results look good? Isn't accuracy on the holdout set the ultimate goal?
2. Explaining predictions.
What part of the input does a model look at? What can we learn from this?
Reading: Smilkov, SmoothGrad; Petsiuk, RISE; Sturmfels, Feature attribution; Ding, Saliency-driven Word Alignment Interpretation; Li, Visualizing and Understanding Neural Models in NLP; Mudrakarta, Did the Model Understand the Question?; Arras, Explaining and Understanding LSTMs
Sebastian Gehrmann (Harvard), Mirac Suzgun (Harvard), Vitali Petsiuk (BU), Julius Adebayo (CSAIL).
Materials: Slides; Labs: Pre-Lab (Vision); RISE (Solutions); NLP Saliency (Solutions).
2:30-4:00pm, Tuesday January 21, 2020.
3. Explaining models.
How does a neural network decompose data internally? What can we learn from a model's representation?
Belinkov and Glass,
Analysis Methods (section 2);
Bau, Network Dissection;
Optional: Adi, Fine-grained Analysis of Sentence Embeddings; Hupkes, Visualization and diagnostic classifiers; Hewitt and Liang, Designing and Interpreting Probes; Olah, Lucid Feature Visualization; Kim, Concept Activation Vectors
4. Adversaries and interpretability.
Why do some interpretability methods fail to uncover reasonable explations? Could this be a consequence of how models actually make decisions? We will discuss recent findings suggesting that ML models rely on features that are imperceptible to humans. Then we will see how training models to be robust to imperceptible input changes can lead to models that rely on more human-aligned features.
Shibani Santurkar (CSAIL), Dimitris Tsipras (CSAIL).
Practicum: Explore simple gradient explanations for standard and robust models
Readings: Generating adversarial examples with FGSM; Simple gradient explanation with SmoothGrad
Optional: Training robust models with robust optimization; ML models rely on imperceptible features; Robustness vs Accuracy; Robustness as a feature prior
Materials: Slides, Exercise notebook, Solutions notebook
2:30-4:00pm, Friday January 24, 2020.
5. Bias and fairness.
Big data and deep learning may unintentionally amplify bias present in the dataset collection or model formulation process. How can we define "fairness" in a quantitative way? How can we audit a system for bias and potential for disparate impact? How can we create equitable models?
Irene Chen (CSAIL)
Reading: Machine Bias (COMPAS data to be used in the lab); Gender Shades; Fair prediction with disparate impact; Dissecting racial bias in an algorithm used to manage the health of populations
Materials: Slides; Lab: COMPAS (solutions).
2:30-4:00pm, Monday January 27, 2020.
6. Interaction and collaboration.
How can interactive methods help us to formulate hypotheses about models and data? What can we learn about the structure of a model by posing counterfactual "what if" questions? How can humans collaborate with machine-learned models? We will describe some common methods to create interactive AI tools and will create two very small examples for interactions with a text and an image model.
Hendrik Strobelt (IBM Research), Sebastian Gehrmann (Harvard SEAS), David Bau (CSAIL)
Reading: Strobelt, Seq2Seq-Vis; Bau, GANPaint; Gehrmann, Collaborative Semantic Inference.
Before class: install miniconda on your laptop.
Materials: Slides; Lab: GANPaint (solutions) Seq2Seq-Vis.
2:30-4:00pm, Wednesday January 29, 2020.
7. Complex explanations.
Sometimes representations and decisions in models are hard to communicate with examples and visualizations. How can we use richer explanations (especially in language) to describe more complex behaviors?
Jacob Andreas (CSAIL), Jesse Mu (Stanford)
Practicum: Generating natural language explanations.
2:30-4:00pm, Thursday January 30, 2020. Note new date
Reading: Hendricks, Generating Visual Explanations; Hendricks, Grounding Visual Explanations; Andreas, Neuralese
Materials: Lab: Natural language explanations