Computational Pathology AI Research Framework 2025–
A research and engineering framework for whole-slide histopathology modeling, multiple-instance learning, federated oncology validation, benchmark automation, failure-mode analysis, and mathematical validation tooling.
The goal is not just to train a model, but to build the surrounding infrastructure needed to make computational pathology experiments reproducible, inspectable, and extensible.
Current work includes PANDA prostate cancer grading with Phikon features, PCam benchmark validation, TransnnMIL model development, PathologyFL federated learning experiments, dominance-aware aggregation research, threshold analysis, and reproducible documentation.
Key Results
- PCam95.37% validation AUC; 85.26% test accuracy and 0.9394 test AUC on the full 32,768-sample test set
- PANDA data10,611 readable slide-level Phikon feature vectors after HDF5 read verification; 768-dimensional mean-pooled embeddings
- PANDA baselineMean-pooled Phikon + MLP: QWK 0.7274
- AttentionMILGated AttentionMIL: QWK 0.8100
- TransnnMILTuned repeated-seed QWK: 0.8155 / 0.8225 / 0.8086
- FedAvg failure15-seed full-PANDA stress studies show FedAvg becomes vulnerable when the dominant simulated site is unreliable
- Label-noise stressCross-site blending significantly improves global QWK at 25% and 35% dominant-site label corruption and worst-site QWK at 45%
- Threshold-shift stressUnder systematic conservative dominant-site grading bias, cross-site blending improves global QWK, worst-site QWK, mean-site QWK, accuracy, and macro-F1 across 25%, 35%, and 45% shift levels
- Detector switchClean-calibrated FedAvg validation diagnostics can trigger dominance-aware switching away from sample-size weighting in unsafe regimes
Research Components
- TransnnMILCustom multiple-instance learning architecture direction for WSI modeling
- PathologyFLFederated learning infrastructure for pathology experiments
- Dominance-aware FLFedAvg failure-mode analysis, cross-site blending, oracle switches, observable detector switches, and threshold-shift transfer tests
- FAIR-WEIGHTS-HInstitutional weighting research with contribution, ordinal harm, entropy, effective-institution diagnostics, and null-result analysis
- ValidationPCam patch validation → PANDA slide-level validation → simulated federated robustness → planned Camelyon16/17 multi-center validation
Current Research Claim
On full PANDA-derived Phikon slide features, FedAvg is strongest when sites are clean but becomes unsafe when the largest simulated client becomes less reliable. Cross-site blending and dominance-aware switching reduce dependence on raw sample count and improve robustness under dominant-site label noise and systematic conservative ordinal grading bias. This is a research result from simulated federations, not clinical validation.
Claim Boundary
Research-only at this stage. Not clinically validated, not diagnostic software, and not currently used for patient care. The PANDA studies are simulated-federation experiments over real pathology-derived feature vectors, not real hospital federated deployments. Long-term goal is responsible clinical translation after proper validation, regulatory review, security review, usability testing, and deployment testing.
Links
- DocsResearch documentation
- GitHubRepository
- Dominance switchFull PANDA dominance-aware switch results
- PANDASlide-level baseline results