RETROSPECTIVE

Research infrastructure lessons from an early medical-AI prototype

Earlier prototype work exploring computational pathology infrastructure taught me about reliability, security, data flow, and ML engineering. It was not production medical AI, not HIPAA-compliant, not diagnostic software, and not hospital-ready.

Matthew Vaishnav

6 May 2026 | Revised July 2026 | 5 min read

What This Was

Between late 2024 and early 2026, I built an early computational pathology research prototype called HistoCore. The goal was to create infrastructure for whole-slide histopathology experiments: data loading, multiple-instance learning model training, evaluation pipelines, and federated learning simulations.

It was a research learning project. I was teaching myself PyTorch, WSI processing, MIL architectures, and the engineering challenges of computational pathology. The prototype achieved some solid research results on public benchmarks like PCam and PANDA, and the federated learning simulations produced interesting findings about dominant-site failure modes.

What I Learned

The most valuable outcome was not any single benchmark number. It was learning how to build reliable ML infrastructure: reproducible data pipelines, systematic evaluation, property-based testing, and the discipline of separating research claims from deployment claims.

I learned that training optimization matters for iteration speed. Moving GPU utilization from 17% to 85% through torch.compile, mixed precision, and data-loader tuning meant I could run experiments in hours instead of days. That faster feedback loop made everything else possible.

I learned that testing at the infrastructure level is different from model evaluation. Property-based testing with Hypothesis caught edge cases I would never have found manually. Federated learning has subtle failure modes that only appear under specific client-dropout or Byzantine conditions, and systematic testing surfaced them.

What It Was Not

This is the part that matters most now. The prototype was not:

Not production medical AI. Not HIPAA-compliant. Not diagnostic software. Not hospital-ready. Not clinically validated. Not a medical device. Not FDA/CE-ready. Not deployed in any clinical setting. Not used for patient care.

The PACS integration components were research prototypes exploring DICOM data flow patterns, not production-grade clinical integrations. The federated learning system was a simulation framework for studying algorithmic behavior, not a deployed multi-hospital training network. The “clinical threshold” experiments were exploratory sensitivity/specificity tradeoff analyses on public benchmark data, not clinical decision thresholds.

In earlier versions of this portfolio, I used language like “production-ready,” “clinical PACS integration,” and “HIPAA-compliant audit logging.” That language was inaccurate and I have removed it. The work was research infrastructure built by a student learning the field — and that is valuable on its own terms without overclaiming.

Research Results That Survive

Some results from this period remain valid as research findings:

PCam Benchmark (public dataset, research evaluation)
- Validation AUC: 95.37%
- Test accuracy: 85.26% (95% CI: 84.83%-85.63%)
- Test AUC: 0.9394 (95% CI: 0.9369-0.9418)
- 1,000 bootstrap resamples

PANDA Slide-Level Experiments (public dataset)
- 10,611 readable Phikon slide feature vectors
- Gated AttentionMIL QWK: 0.8100
- Tuned TransnnMIL QWK: 0.8155 / 0.8225 / 0.8086

Federated Stress Tests (simulated federations)
- FedAvg dominant-site vulnerability identified
- Cross-site blending improved robustness
- Research-only; not real hospital federated deployments

Where I Am Now

My current research has moved to Paired-Acquisition Neural Factorization: testing whether paired acquisitions of the same tissue can reduce scanner/acquisition signal in pathology embeddings while preserving tissue identity. That work, plus external canine SCC validation, pair-repeat allocation, and mechanism audits, is my strongest current research line.

The early infrastructure work was necessary — it built the engineering instincts and experimental discipline I use now. But the framing matters. It was research infrastructure and learning, not a production medical AI platform.

Current research: github.com/matthewvaishnav/computational-pathology-research