A tested PyTorch framework for computational pathology research with working benchmarks on PatchCamelyon and CAMELYON16
View on GitHub matthewvaishnav/computational-pathology-research
Date: 2026-04-07
Status: ✅ OPERATIONAL
This guide documents the reproducible PCam baseline comparison pipeline that enables systematic evaluation of different model variants on the PatchCamelyon dataset.
The comparison pipeline allows you to:
python experiments/compare_pcam_baselines.py \
--configs experiments/configs/pcam_comparison/*.yaml \
--quick-test
python experiments/compare_pcam_baselines.py \
--configs experiments/configs/pcam_comparison/*.yaml
python experiments/compare_pcam_baselines.py \
--configs \
experiments/configs/pcam_comparison/baseline_resnet18.yaml \
experiments/configs/pcam_comparison/resnet50.yaml
results/pcam_comparison/
├── comparison_results.json # Aggregated comparison metrics
├── baseline_resnet18/
│ ├── metrics.json # Detailed metrics
│ ├── confusion_matrix.png # Confusion matrix plot
│ └── roc_curve.png # ROC curve plot
├── resnet50/
│ ├── metrics.json
│ ├── confusion_matrix.png
│ └── roc_curve.png
└── simple_head/
├── metrics.json
├── confusion_matrix.png
└── roc_curve.png
checkpoints/pcam_comparison/
├── baseline_resnet18/best_model.pth
├── resnet50/best_model.pth
└── simple_head/best_model.pth
logs/pcam_comparison/
├── baseline_resnet18/ # TensorBoard logs
├── resnet50/
└── simple_head/
The comparison_results.json file contains aggregated metrics for all variants:
{
"timestamp": "2026-04-07 15:42:52",
"variants": [
{
"name": "baseline_resnet18",
"config_path": "experiments/configs/pcam_comparison/baseline_resnet18.yaml",
"training_status": "success",
"evaluation_status": "success",
"training_time_seconds": 31.85,
"test_accuracy": 1.0,
"test_auc": 1.0,
"test_f1": 1.0,
"test_precision": 1.0,
"test_recall": 1.0,
"model_parameters": {
"feature_extractor": 11176512,
"encoder": 987904,
"head": 33281,
"total": 12197697
},
"inference_time_seconds": 0.68,
"samples_per_second": 146.5,
"checkpoint_path": "checkpoints/pcam_comparison/baseline_resnet18/best_model.pth",
"results_dir": "results/pcam_comparison/baseline_resnet18"
}
]
}
| Variant | Accuracy | AUC | F1 | Parameters | Training Time |
|---|---|---|---|---|---|
| baseline_resnet18 | 1.0000 | 1.0000 | 1.0000 | 12.2M | 31.8s |
| simple_head | 0.5500 | 1.0000 | 0.3548 | 12.2M | 25.1s |
Observations (3-epoch quick test):
To add a new baseline variant:
experiments/configs/pcam_comparison/Example:
experiment:
name: my_new_variant
description: Description of what's different
tags: [pcam, my-tag]
# ... architecture config ...
checkpoint:
checkpoint_dir: ./checkpoints/pcam_comparison/my_new_variant
logging:
log_dir: ./logs/pcam_comparison/my_new_variant
evaluation:
output_dir: ./results/pcam_comparison/my_new_variant
What you CAN say:
What you CANNOT say:
To make stronger claims, you would need to:
python experiments/compare_pcam_baselines.py \
--configs experiments/configs/pcam_comparison/*.yaml \
--quick-test
python experiments/compare_pcam_baselines.py \
--configs experiments/configs/pcam_comparison/*.yaml
python experiments/compare_pcam_baselines.py \
--configs experiments/configs/pcam_comparison/*.yaml \
--skip-training
python experiments/compare_pcam_baselines.py \
--configs experiments/configs/pcam_comparison/*.yaml \
--output results/my_comparison/results.json
If evaluation fails with “checkpoint not found”, ensure training completed successfully. Check:
checkpoints/pcam_comparison/<variant_name>/best_model.pth existslogs/pcam_comparison/<variant_name>/If training fails with OOM:
batch_size in config files--quick-test for faster iterationIf results vary between runs:
seed: 42 is set in all configsStatus: Comparison pipeline operational ✅
Dataset: Synthetic PCam subset (500/100/100) ⚠️
Clinical validation: Not applicable ❌
Scientific benchmark: Requires full dataset ⚠️