This project investigates whether contrastive explanations provide better interpretability than traditional explanations in deep learning models. Rather than answering "Why is this classified as class A?", we explore "Why is this class A rather than class B?".
Can providing contrastive information (comparing the predicted class against the closest competing class) lead to more meaningful and interpretable explanations of neural network predictions?
Traditional explainability methods focus on explaining individual predictions. However, human explanations are often contrastive in nature—we naturally explain decisions by comparing alternatives. This project explores whether machine-generated explanations benefit from this contrastive approach.
Key Hypothesis: Comparing feature importance between the top two predicted classes reveals more interpretable patterns than analyzing a single class in isolation.
The project implements and compares multiple state-of-the-art explainability techniques:
- Integrated Gradients - Accumulates gradients along a straight line from a baseline to the input
- Noise Tunnel - Computes explanations with smoothing via noise addition
- Gradient SHAP - Combines SHAP with gradient information
- Saliency Maps - Computes input gradients to identify important features
- Classical Explainability: Generate explanations for the predicted class independently
- Contrasting Explainability: Generate explanations by computing differences between:
- Feature importance matrix of the predicted class
- Feature importance matrix of the closest competing class
- Performance: ✅ Excellent results
- Example: Distinguishing between 4 and 9
- The contrastive approach clearly highlights the discriminative features between these confusable digits
- The difference matrices reveal structural differences effectively
- Performance:
⚠️ Mixed results - Example: Duck classification
- Contrastive explanations provide less clarity with complex natural images
- Suggests that the approach may be more suitable for simpler, more structured datasets
| Aspect | Classical Approach | Contrasting Approach |
|---|---|---|
| MNIST (4 vs 9) | Good | Excellent ✓ |
| MNIST General | Good | Good |
| ImageNet (Ducks) | Moderate | Moderate |
| Complex Scenes | Moderate | Moderate |
| Interpretability | Varies | More focused |
- ✅ Strong for simple, structured data: The contrastive approach excels with datasets like MNIST where classes have clear structural differences
⚠️ Limited for complex real-world images: Natural images contain too much contextual information; simple feature differences don't capture semantic distinctions- 🎯 Best use case: Binary or few-class classification with distinct visual patterns
- 📈 Future improvement: May benefit from hierarchical contrastive analysis or semantic feature grouping
pip install -r requirements.txtjupyter notebook Contrasting_Explanation.ipynbcontrasting_explanation/
├── Contrasting_Explanation.ipynb # Main analysis notebook
├── README.md # This file
├── requirements.txt # Python dependencies
├── data/
│ ├── MNIST/ # Handwritten digits dataset
│ │ └── raw/
│ ├── ImageNet/ # ImageNet classes
│ │ └── imagenet_class_index.json
│ └── test_image/ # Sample images for testing
└── weights/
└── mnist_weights.pth # Pre-trained MNIST model
For comprehensive background on contrastive explanations and interpretability in deep learning, see:
- Contrastive Explanations: Doshi-Velez & Kim (2017) - Towards A Rigorous Science of Interpretable Machine Learning
- Integrated Gradients: Sundararajan et al. (2017)
- SHAP Methods: Lundberg & Lee (2017) - A Unified Approach to Interpreting Model Predictions
- Attention & Saliency: Simonyan et al. (2013) - Deep Inside Convolutional Networks
- Captum Library: PyTorch's interpretability library
- Contrastive Learning: Chen et al. (2020) - SimCLR
- Model Agnostic Meta-Learning (MAML): For few-shot understanding
- Test contrastive approach on other structured datasets (medical imaging, document classification)
- Implement hierarchical contrasting (explaining against multiple competing classes)
- Add quantitative metrics for explanation quality
- Extend to NLP models for text classification
- Develop interactive visualization tools
- Compare with recent contrastive learning methods
Last Updated: May 2026