Computational Pathology Research Framework

Logo

A tested PyTorch framework for computational pathology research with working benchmarks on PatchCamelyon and CAMELYON16

View on GitHub matthewvaishnav/computational-pathology-research

PatchCamelyon Baseline Comparison Guide

Date: 2026-04-07
Status: ✅ OPERATIONAL

Overview

This guide documents the reproducible PCam baseline comparison pipeline that enables systematic evaluation of different model variants on the PatchCamelyon dataset.

What This Enables

The comparison pipeline allows you to:

Available Baselines

1. baseline_resnet18 (Default)

2. resnet50

3. simple_head

Quick Start

Run All Comparisons (Quick Test - 3 epochs)

python experiments/compare_pcam_baselines.py \
  --configs experiments/configs/pcam_comparison/*.yaml \
  --quick-test

Run All Comparisons (Full Training - 20 epochs)

python experiments/compare_pcam_baselines.py \
  --configs experiments/configs/pcam_comparison/*.yaml

Run Specific Variants

python experiments/compare_pcam_baselines.py \
  --configs \
    experiments/configs/pcam_comparison/baseline_resnet18.yaml \
    experiments/configs/pcam_comparison/resnet50.yaml

Output Structure

results/pcam_comparison/
├── comparison_results.json          # Aggregated comparison metrics
├── baseline_resnet18/
│   ├── metrics.json                 # Detailed metrics
│   ├── confusion_matrix.png         # Confusion matrix plot
│   └── roc_curve.png                # ROC curve plot
├── resnet50/
│   ├── metrics.json
│   ├── confusion_matrix.png
│   └── roc_curve.png
└── simple_head/
    ├── metrics.json
    ├── confusion_matrix.png
    └── roc_curve.png

checkpoints/pcam_comparison/
├── baseline_resnet18/best_model.pth
├── resnet50/best_model.pth
└── simple_head/best_model.pth

logs/pcam_comparison/
├── baseline_resnet18/               # TensorBoard logs
├── resnet50/
└── simple_head/

Comparison Results Format

The comparison_results.json file contains aggregated metrics for all variants:

{
  "timestamp": "2026-04-07 15:42:52",
  "variants": [
    {
      "name": "baseline_resnet18",
      "config_path": "experiments/configs/pcam_comparison/baseline_resnet18.yaml",
      "training_status": "success",
      "evaluation_status": "success",
      "training_time_seconds": 31.85,
      "test_accuracy": 1.0,
      "test_auc": 1.0,
      "test_f1": 1.0,
      "test_precision": 1.0,
      "test_recall": 1.0,
      "model_parameters": {
        "feature_extractor": 11176512,
        "encoder": 987904,
        "head": 33281,
        "total": 12197697
      },
      "inference_time_seconds": 0.68,
      "samples_per_second": 146.5,
      "checkpoint_path": "checkpoints/pcam_comparison/baseline_resnet18/best_model.pth",
      "results_dir": "results/pcam_comparison/baseline_resnet18"
    }
  ]
}

Example Comparison Results (Quick Test - 3 epochs)

Variant Accuracy AUC F1 Parameters Training Time
baseline_resnet18 1.0000 1.0000 1.0000 12.2M 31.8s
simple_head 0.5500 1.0000 0.3548 12.2M 25.1s

Observations (3-epoch quick test):

Reproducibility

Fixed Settings Across All Variants

What Varies Between Baselines

Adding New Baselines

To add a new baseline variant:

  1. Create a new config file in experiments/configs/pcam_comparison/
  2. Set unique paths for checkpoints, logs, and results
  3. Modify architecture as needed
  4. Run comparison with the new config included

Example:

experiment:
  name: my_new_variant
  description: Description of what's different
  tags: [pcam, my-tag]

# ... architecture config ...

checkpoint:
  checkpoint_dir: ./checkpoints/pcam_comparison/my_new_variant

logging:
  log_dir: ./logs/pcam_comparison/my_new_variant

evaluation:
  output_dir: ./results/pcam_comparison/my_new_variant

Important Caveats

Dataset Scale

Validation Scope

Honest Claims Enabled

What you CAN say:

What you CANNOT say:

Next Steps for Stronger Benchmarks

To make stronger claims, you would need to:

  1. Download full PCam dataset (~7GB)
  2. Train on full 262K training set
  3. Evaluate on full 32K test set
  4. Implement published baselines (e.g., standard ResNet, DenseNet from PCam paper)
  5. Run fair comparisons with same preprocessing and evaluation protocol
  6. Compute confidence intervals with bootstrap or cross-validation
  7. Test on other datasets (CAMELYON16, TCGA) for generalization

Commands Reference

Quick Test (3 epochs, ~2 minutes per variant)

python experiments/compare_pcam_baselines.py \
  --configs experiments/configs/pcam_comparison/*.yaml \
  --quick-test

Full Training (20 epochs, ~40 seconds per variant on CPU)

python experiments/compare_pcam_baselines.py \
  --configs experiments/configs/pcam_comparison/*.yaml

Skip Training (Evaluate Existing Checkpoints)

python experiments/compare_pcam_baselines.py \
  --configs experiments/configs/pcam_comparison/*.yaml \
  --skip-training

Custom Output Path

python experiments/compare_pcam_baselines.py \
  --configs experiments/configs/pcam_comparison/*.yaml \
  --output results/my_comparison/results.json

Troubleshooting

Checkpoint Not Found

If evaluation fails with “checkpoint not found”, ensure training completed successfully. Check:

Out of Memory

If training fails with OOM:

Inconsistent Results

If results vary between runs:


Status: Comparison pipeline operational ✅
Dataset: Synthetic PCam subset (500/100/100) ⚠️
Clinical validation: Not applicable ❌
Scientific benchmark: Requires full dataset ⚠️