FAIR-WEIGHTS-H: Hybrid Institutional Weighting Protocol
Status: Proposed protocol and implementation specification
Scope: Federated computational pathology research system
Validation status: Empirically tested for execution stability and aggregation behavior; performance/fairness advantage over simpler baselines not yet demonstrated
1. Motivation
The current prestige-style institutional weighting pattern assigns higher influence to cancer centers and lower influence to rural or community hospitals based on institutional category. That is not scientifically defensible for federated pathology because institutional reputation is not the same as measured contribution, clinical reliability, subgroup coverage, or domain-shift value.
FAIR-WEIGHTS-H replaces prestige weighting with an auditable hybrid protocol that combines:
- counterfactual contribution estimation,
- diagnostic and process quality,
- distributional uniqueness,
- representation and subgroup-safety constraints,
- uncertainty penalties,
- anomaly monitoring and fallback modes.
The framework is intended as a research protocol. It does not claim clinical validation or regulatory clearance.
2. Key Design Principle
A single scalar institution weight cannot by itself guarantee fairness or safety. Therefore FAIR-WEIGHTS-H separates three concepts:
where:
: aggregation weight used during federated model updates, : validation representation priority, : post-market or research monitoring priority.
A site may have a low training weight because of uncertain or noisy updates while still receiving high validation and monitoring priority if it represents an underserved or clinically important population.
3. Institutional Signals
For institution
where:
| Symbol | Meaning | Status |
|---|---|---|
| Difficulty-adjusted reference-case diagnostic quality | Proposed | |
| Process and pathology quality composite | Proposed | |
| Group-aware counterfactual contribution estimate | Proposed | |
| Jensen-Shannon distributional uniqueness | Proposed | |
| Underserved-population representation score | Proposed | |
| Bounded/sublinear volume factor | Proposed | |
| Uncertainty, instability, or anomaly penalty | Proposed |
Gradient alignment with the current global model is not used as a primary contribution factor because it can encode status quo bias. It may be used only for anomaly detection and drift monitoring.
4. Integrity Gate
FAIR-WEIGHTS-H uses a hard gate only for integrity and safety failures, not for prestige or raw quality.
where:
only when data integrity checks pass, only when no severe label-corruption signal is detected, only when there is no active safety violation.
Low resource level, rural status, or case difficulty must not directly trigger exclusion. Quality should be modeled with difficulty adjustment and uncertainty, not with a crude hard threshold.
5. Difficulty-Adjusted Quality
Raw accuracy on reference cases can be misleading when institutions serve different case mixes. FAIR-WEIGHTS-H therefore requires a pre-specified difficulty-adjustment model before using adjusted quality in weight computation.
One defensible model is:
where:
: correct or incorrect diagnosis for case at site , : tumor type, stage, slide quality, stain quality, scanner metadata, referral status, and case complexity, : institution-level effect after adjustment.
The adjusted quality is evaluated on a standardized reference distribution:
This adjustment must be calibrated and audited. It cannot simply be asserted.
6. Useful Uniqueness
Distributional uniqueness alone is not always beneficial. A site can be unique because it serves rare populations, but it can also be unique because of scanner artifacts, poor fixation, or systematic labeling errors.
Therefore uniqueness is treated as a weak signal unless paired with quality and subgroup utility:
where
7. Counterfactual Contribution
The preferred contribution signal is not local gradient alignment. It is counterfactual marginal contribution:
For production feasibility, FAIR-WEIGHTS-H estimates this with grouped or multi-membership Owen-style sampling:
where
This avoids forcing ambiguous hospitals into a single administrative category.
8. Training Weight Objective
The quarterly training weights are produced by a constrained optimization problem:
subject to:
The fairness and subgroup safety requirements are constraints, not optional score boosts.
9. Empirical Status
FAIR-WEIGHTS-H has been empirically tested for execution stability and aggregation behavior in this repository.
Completed checks include:
- synthetic Camelyon-like smoke validation,
- PCam federated smoke validation,
- PCam all-strategy smoke validation,
- PCam balanced federated benchmark,
- PCam heterogeneous federated benchmark.
These tests show that FAIR-WEIGHTS-H runs without numerical failure, produces distinct weight trajectories under heterogeneous simulated sites, and does not degrade performance in the current patch-level PCam setup. They do not yet show a consistent performance or fairness advantage over simpler aggregation baselines. That stronger claim requires the planned ablation and slide-level multi-center validation.
10. Quarterly Algorithm
- Collect privacy-preserving aggregate signals from institutions.
- Run integrity checks and anomaly detection.
- Estimate difficulty-adjusted quality and uncertainty.
- Estimate distributional uniqueness using aggregate profiles.
- Estimate approximate Owen contribution using sampled warm-started coalitions.
- Compute useful uniqueness.
- Solve the constrained weight optimization problem.
- Run subgroup safety and representation audits.
- Produce an institution-level report card.
- Version the weights, coefficients, inputs, and audit results.
11. Anomaly Monitoring
Continuous or between-quarter monitoring should detect:
- scanner drift,
- stain distribution shift,
- abrupt gradient drift,
- local validation collapse,
- unusual update variance,
- suspicious simultaneous score changes among affiliated institutions,
- distributional specialization that reduces coverage of community or underserved cases.
If severe anomalies are detected, weights should be throttled or frozen pending review.
12. Fallback Modes
FAIR-WEIGHTS-H should degrade safely rather than continue normal operation during instability.
| Mode | Trigger | Action |
|---|---|---|
| Normal | No major anomaly | Full hybrid weighting |
| Conservative | Moderate anomaly or high uncertainty | Freeze coefficients, reduce caps, rely more on verified quality |
| Safety | Systemic manipulation or validation failure | Suspend uniqueness/contribution bonuses; use externally verified metrics only |
| Emergency | Severe corruption or safety violation | Temporarily exclude affected institution updates pending remediation |
13. Validation Plan
FAIR-WEIGHTS-H should be compared against:
- equal weighting,
- volume weighting,
- prestige weighting,
- original multiplicative FAIR-WEIGHTS,
- Shapley/Owen-only attribution,
- FAIR-WEIGHTS-H.
Primary metrics should include:
- global AUC,
- balanced accuracy,
- calibration error,
- worst-group sensitivity,
- false-negative-rate disparity,
- subgroup non-inferiority,
- convergence stability,
- weight stability,
- robustness under missingness and simulated gaming.
14. Regulatory-Safe Claim Language
Use:
FAIR-WEIGHTS-H enforces pre-specified representation and subgroup-performance constraints and requires prospective validation before clinical deployment.
Do not use:
FAIR-WEIGHTS-H proves fairness.
Use:
Shapley/Owen values provide an axiomatic counterfactual attribution benchmark whose conclusions depend on the chosen validation utility and reference distribution.