A tested PyTorch framework for computational pathology research with working benchmarks on PatchCamelyon and CAMELYON16
View on GitHub matthewvaishnav/computational-pathology-research
The CAMELYON16 training path is now functional with synthetic data. This document describes what exists, what works, and what remains to be implemented.
Training Script: experiments/train_camelyon.py
Evaluation Script: experiments/evaluate_camelyon.py
Model Architecture: SimpleSlideClassifier
Dataset Implementation: src/data/camelyon_dataset.py
Script: scripts/generate_synthetic_camelyon.py
Default Synthetic Dataset:
Full Training: experiments/configs/camelyon.yaml
Quick Test: experiments/configs/camelyon_quick_test.yaml
Config Tests: tests/test_camelyon_config.py (13 tests)
Generator Tests: tests/test_generate_synthetic_camelyon.py (5 tests)
Evaluation Tests: tests/test_evaluate_camelyon.py (7 tests)
Training (1 epoch on synthetic data):
python experiments/train_camelyon.py --config experiments/configs/camelyon_quick_test.yaml
Results:
Evaluation (test split on synthetic data):
python experiments/evaluate_camelyon.py --checkpoint checkpoints/camelyon_quick_test/best_model.pth --split test
Results:
Status: ✅ Complete training → evaluation workflow works end-to-end
WSI Preprocessing Pipeline:
Feature Extraction:
Annotation Processing:
Attention-Based Aggregation:
Graph-Based Methods:
Interpretability:
Statistical Analysis:
CAMELYON16:
CAMELYON17:
# Default: 20 train, 5 val, 5 test slides
python scripts/generate_synthetic_camelyon.py
# Custom configuration
python scripts/generate_synthetic_camelyon.py \
--output-dir ./data/camelyon \
--num-train 50 \
--num-val 10 \
--num-test 10 \
--num-patches 200 \
--feature-dim 2048
# Quick 1-epoch smoke test
python experiments/train_camelyon.py \
--config experiments/configs/camelyon_quick_test.yaml
# Full training (50 epochs)
python experiments/train_camelyon.py \
--config experiments/configs/camelyon.yaml
# Evaluate on test split
python experiments/evaluate_camelyon.py \
--checkpoint checkpoints/camelyon_quick_test/best_model.pth \
--split test \
--output-dir results/camelyon_quick_test
# Evaluate on validation split
python experiments/evaluate_camelyon.py \
--checkpoint checkpoints/camelyon/best_model.pth \
--split val \
--output-dir results/camelyon
# Use max pooling aggregation
python experiments/evaluate_camelyon.py \
--checkpoint checkpoints/camelyon/best_model.pth \
--split test \
--aggregation max \
--output-dir results/camelyon_max
# Config tests
pytest tests/test_camelyon_config.py -v
# Generator tests
pytest tests/test_generate_synthetic_camelyon.py -v
# Evaluation tests
pytest tests/test_evaluate_camelyon.py -v
# All CAMELYON tests
pytest tests/test_camelyon*.py tests/test_generate_synthetic_camelyon.py tests/test_evaluate_camelyon.py -v
computational-pathology-research/
├── experiments/
│ ├── train_camelyon.py # Training script
│ ├── evaluate_camelyon.py # Evaluation script
│ └── configs/
│ ├── camelyon.yaml # Full training config
│ └── camelyon_quick_test.yaml # Quick test config
├── scripts/
│ └── generate_synthetic_camelyon.py # Synthetic data generator
├── src/
│ └── data/
│ ├── camelyon_dataset.py # Dataset classes
│ └── camelyon_annotations.py # Annotation processing (stub)
├── tests/
│ ├── test_camelyon_config.py # Config tests
│ ├── test_generate_synthetic_camelyon.py # Generator tests
│ └── test_evaluate_camelyon.py # Evaluation tests
└── data/
└── camelyon/ # Data directory (gitignored)
├── slide_index.json # Slide metadata
└── features/ # HDF5 feature files
├── slide_000.h5
├── slide_001.h5
└── ...
| Feature | PCam | CAMELYON |
|---|---|---|
| Training Script | ✅ Complete | ✅ Complete |
| Evaluation Script | ✅ Complete | ✅ Complete |
| Synthetic Data | ✅ 700 samples | ✅ 30 slides (3000 patches) |
| Real Data Support | ✅ H5 format | ❌ Requires WSI preprocessing |
| Model Architecture | ✅ ResNet + Transformer | ✅ SimpleSlideClassifier |
| Benchmark Results | ✅ 94% accuracy | ✅ 100% acc (synthetic) |
| Interpretability | ✅ Full suite | ❌ Not implemented |
| Comparison Runner | ✅ Complete | ❌ Not implemented |
experiments/compare_camelyon_baselines.py⚠️ Synthetic Data Only: Current results are on synthetic data with artificially separated classes. Real CAMELYON16 data will be significantly more challenging.
⚠️ Simple Baseline: SimpleSlideClassifier is a minimal baseline. State-of-the-art methods use attention mechanisms, graph neural networks, or transformer architectures.
⚠️ No Clinical Validation: This is a research framework for testing architectural ideas, not a clinical tool.
⚠️ Patch-Level Workaround: Current implementation treats patches independently rather than true slide-level batching. This works but is not optimal for memory efficiency.
cbcc317: Initial CAMELYON config scaffold2cd907e: Replace placeholder with real training scripta0496af: Add synthetic data generator and complete training path440e868: Add CAMELYON training status documentationf7f6bc2: Add CAMELYON evaluation script with slide-level metrics