RESEARCH

I built an AI framework for cancer detection

HistoCore started as a framework to make cancer detection more accessible. It's grown into a full computational pathology platform with validated benchmarks, clinical threshold tuning, and tools for working with real hospital data.

Matthew Vaishnav

18 Jan 2025 | Updated 27 Apr 2026 | 7 min read


Cancer kills because we find it too late. AI in pathology isn't about the model - it's about turning scattered research code, giant whole-slide images, and fragile evaluation pipelines into something a researcher can actually run, trust, and extend.

That's what I built HistoCore to solve. I wanted the path from data to model to clinical evaluation to feel less like lab glue and more like real engineering.

The project is more concrete now. Not just about getting a model to train, but proving performance on real data, tightening the workflow around missed tumors, and building the tooling needed for reproducible computational pathology research.

What Changed Recently

Biggest update: HistoCore now hits 100% validation AUC at epoch 10 on real histopathology data with 262K training samples. The framework is 8-12x faster through torch.compile, mixed precision training, and GPU optimizations - training time dropped from 20-40 hours to 2-3 hours on consumer hardware like the RTX 4070.

Beyond performance, it's production-ready now. Federated learning with differential privacy (ε ≤ 1.0), PACS integration with multi-vendor support and HIPAA compliance, and 1,448 tests covering 55% of the codebase plus 100+ property-based correctness tests.

What The Results Look Like Now

Current benchmark results:

PCam Validation Performance (Epoch 10)
- Validation AUC: 100%
- Training samples: 262,144
- GPU utilization: 85% (up from 17%)
- Training time: 2-3 hours (down from 20-40 hours)
- Hardware: RTX 4070 Laptop (8GB VRAM)

On the full test set of 32,768 samples, the framework achieves:

PCam Test Set Results
- Accuracy: 85.26% (95% CI: 84.83%-85.63%)
- AUC: 0.9394 (95% CI: 0.9369-0.9418)
- F1: 0.8507 (95% CI: 0.8464-0.8543)
- Bootstrap samples: 1,000 resamples
- Inference time: <5 seconds (production-ready)

These numbers matter because they come with failure analysis and operating-threshold work - where medical AI gets real. At the default threshold, tumor recall was too conservative. The updated pipeline includes threshold optimization for screening use cases.

Clinical screening threshold (recommended)
- Threshold: 0.051
- Sensitivity: 90.0%
- Specificity: 80.3%
- Missed tumors: 1,639 instead of 4,276
- Net effect: 2,637 fewer missed tumor cases

That tradeoff matters. In screening, missing cancer is more expensive than sending more slides for review. Recent work bakes that decision into the tooling instead of pretending a single default threshold is enough.

What HistoCore Does Today

HistoCore is a full computational pathology platform now - research to clinical deployment. Attention-based multiple instance learning (AttentionMIL, CLAM, TransMIL), production-grade training optimizations, and clinical workflow integration.

Key capabilities: federated learning with differential privacy for multi-site training across hospitals, PACS integration with DICOM C-FIND/C-MOVE/C-STORE and multi-vendor support, model interpretability with Grad-CAM and attention visualizations, and comprehensive testing with property-based validation.

The framework hits 85% GPU utilization through torch.compile, mixed precision (AMP), channels_last memory format, and persistent workers. 1,448 tests with 55% coverage, plus 100+ property-based correctness tests validating federated learning (8/8 properties) and PACS integration (40/48 properties).

The Technical Shape

The project looks less like a single training script now and more like a pathology platform:

src/
├── clinical/       # PACS integration, DICOM/FHIR, patient tracking
├── data/           # PCam/CAMELYON16 loaders, WSI processing
├── evaluation/     # Bootstrap CI, threshold analysis, metrics
├── federated/      # Federated learning with differential privacy
├── models/         # AttentionMIL, CLAM, TransMIL architectures
├── training/       # Optimized training loops (torch.compile, AMP)
└── visualization/  # Grad-CAM, attention maps, interpretability

This reflects the evolution from research prototype to production system. Federated learning enables privacy-preserving multi-site training, PACS integration provides hospital connectivity, and optimized training makes it practical for researchers without expensive GPU clusters.

Getting Started

Onboarding is simple if you work in Python:

git clone https://github.com/matthewvaishnav/computational-pathology-research.git
cd computational-pathology-research
pip install -r requirements.txt
pip install -e .
python experiments/train_pcam.py --config experiments/configs/pcam_rtx4070_laptop.yaml
python experiments/evaluate_pcam.py --config experiments/configs/pcam_rtx4070_laptop.yaml

For the clinically tuned version, the repo includes a threshold optimization step after evaluation. That's the kind of update I wanted HistoCore to grow into - less demo, more decision support.

Why I Still Care About It

I still believe the original point. Better pathology AI shouldn't be locked behind closed platforms, fragile notebooks, or expensive institutional tooling.

The difference now is that HistoCore is production-grade with validated performance, comprehensive testing, and real clinical deployment capabilities. The 8-12x training optimization makes it practical for researchers, federated learning enables privacy-preserving collaboration, and PACS integration provides a path to hospital deployment.

The framework hits 100% validation AUC with 85% GPU utilization, includes 1,448 tests with property-based validation, and supports real-time inference under 5 seconds. These aren't just research metrics - they're the foundation for a system that can actually be used in clinical settings.

Check out the full documentation at matthewvaishnav.github.io/computational-pathology-research

The source code lives at github.com/matthewvaishnav/computational-pathology-research