Skip to content

FAIR-WEIGHTS-H: Hybrid Institutional Weighting Protocol

Status: Proposed protocol and implementation specification
Scope: Federated computational pathology research system
Validation status: Empirically tested for execution stability and aggregation behavior; performance/fairness advantage over simpler baselines not yet demonstrated


1. Motivation

The current prestige-style institutional weighting pattern assigns higher influence to cancer centers and lower influence to rural or community hospitals based on institutional category. That is not scientifically defensible for federated pathology because institutional reputation is not the same as measured contribution, clinical reliability, subgroup coverage, or domain-shift value.

FAIR-WEIGHTS-H replaces prestige weighting with an auditable hybrid protocol that combines:

  1. counterfactual contribution estimation,
  2. diagnostic and process quality,
  3. distributional uniqueness,
  4. representation and subgroup-safety constraints,
  5. uncertainty penalties,
  6. anomaly monitoring and fallback modes.

The framework is intended as a research protocol. It does not claim clinical validation or regulatory clearance.


2. Key Design Principle

A single scalar institution weight cannot by itself guarantee fairness or safety. Therefore FAIR-WEIGHTS-H separates three concepts:

witrainwivalwimonitor

where:

  • witrain: aggregation weight used during federated model updates,
  • wival: validation representation priority,
  • wimonitor: post-market or research monitoring priority.

A site may have a low training weight because of uncertain or noisy updates while still receiving high validation and monitoring priority if it represents an underserved or clinically important population.


3. Institutional Signals

For institution i, define the feature vector:

zi=[Aiadj,Qi,ϕiOwen,JSi,Fi,Vi,Si]

where:

SymbolMeaningStatus
AiadjDifficulty-adjusted reference-case diagnostic qualityProposed
QiProcess and pathology quality compositeProposed
ϕiOwenGroup-aware counterfactual contribution estimateProposed
JSiJensen-Shannon distributional uniquenessProposed
FiUnderserved-population representation scoreProposed
ViBounded/sublinear volume factorProposed
SiUncertainty, instability, or anomaly penaltyProposed

Gradient alignment with the current global model is not used as a primary contribution factor because it can encode status quo bias. It may be used only for anomaly detection and drift monitoring.


4. Integrity Gate

FAIR-WEIGHTS-H uses a hard gate only for integrity and safety failures, not for prestige or raw quality.

Ii=GidataGilabelGisafety

where:

  • Gidata=1 only when data integrity checks pass,
  • Gilabel=1 only when no severe label-corruption signal is detected,
  • Gisafety=1 only when there is no active safety violation.

Low resource level, rural status, or case difficulty must not directly trigger exclusion. Quality should be modeled with difficulty adjustment and uncertainty, not with a crude hard threshold.


5. Difficulty-Adjusted Quality

Raw accuracy on reference cases can be misleading when institutions serve different case mixes. FAIR-WEIGHTS-H therefore requires a pre-specified difficulty-adjustment model before using adjusted quality in weight computation.

One defensible model is:

YijBernoulli(pij)logit(pij)=α+βXij+bi

where:

  • Yij: correct or incorrect diagnosis for case j at site i,
  • Xij: tumor type, stage, slide quality, stain quality, scanner metadata, referral status, and case complexity,
  • bi: institution-level effect after adjustment.

The adjusted quality is evaluated on a standardized reference distribution:

Aiadj=EXPref[Pr(Y=1X,i)]

This adjustment must be calibrated and audited. It cannot simply be asserted.


6. Useful Uniqueness

Distributional uniqueness alone is not always beneficial. A site can be unique because it serves rare populations, but it can also be unique because of scanner artifacts, poor fixation, or systematic labeling errors.

Therefore uniqueness is treated as a weak signal unless paired with quality and subgroup utility:

Diuseful=JSiAiadjUisubgroup

where Uisubgroup measures whether the institution improves performance on the subgroup, morphology, or cancer subtype it uniquely represents.


7. Counterfactual Contribution

The preferred contribution signal is not local gradient alignment. It is counterfactual marginal contribution:

ϕi=ESN{i}[U(S{i})U(S)]

For production feasibility, FAIR-WEIGHTS-H estimates this with grouped or multi-membership Owen-style sampling:

ϕ^iOwen=gmigϕ^ig

where mig[0,1] allows institutions to belong partly to multiple groups, such as academic, rural-serving, specialty center, or network-affiliated.

This avoids forcing ambiguous hospitals into a single administrative category.


8. Training Weight Objective

The quarterly training weights are produced by a constrained optimization problem:

wt=argmaxwWi=1Kwi(ϕ^i,tOwen+λDDi,tuseful+λFFi,t+λQQi,tλSSi,t)

subject to:

iwi=1wiminwiwimaxCg(w)CgmingGunderservedPerfg(w)PerfgmingGclinical|wi,twi,t1|Δi

The fairness and subgroup safety requirements are constraints, not optional score boosts.


9. Empirical Status

FAIR-WEIGHTS-H has been empirically tested for execution stability and aggregation behavior in this repository.

Completed checks include:

  • synthetic Camelyon-like smoke validation,
  • PCam federated smoke validation,
  • PCam all-strategy smoke validation,
  • PCam balanced federated benchmark,
  • PCam heterogeneous federated benchmark.

These tests show that FAIR-WEIGHTS-H runs without numerical failure, produces distinct weight trajectories under heterogeneous simulated sites, and does not degrade performance in the current patch-level PCam setup. They do not yet show a consistent performance or fairness advantage over simpler aggregation baselines. That stronger claim requires the planned ablation and slide-level multi-center validation.


10. Quarterly Algorithm

  1. Collect privacy-preserving aggregate signals from institutions.
  2. Run integrity checks and anomaly detection.
  3. Estimate difficulty-adjusted quality and uncertainty.
  4. Estimate distributional uniqueness using aggregate profiles.
  5. Estimate approximate Owen contribution using sampled warm-started coalitions.
  6. Compute useful uniqueness.
  7. Solve the constrained weight optimization problem.
  8. Run subgroup safety and representation audits.
  9. Produce an institution-level report card.
  10. Version the weights, coefficients, inputs, and audit results.

11. Anomaly Monitoring

Continuous or between-quarter monitoring should detect:

  • scanner drift,
  • stain distribution shift,
  • abrupt gradient drift,
  • local validation collapse,
  • unusual update variance,
  • suspicious simultaneous score changes among affiliated institutions,
  • distributional specialization that reduces coverage of community or underserved cases.

If severe anomalies are detected, weights should be throttled or frozen pending review.


12. Fallback Modes

FAIR-WEIGHTS-H should degrade safely rather than continue normal operation during instability.

ModeTriggerAction
NormalNo major anomalyFull hybrid weighting
ConservativeModerate anomaly or high uncertaintyFreeze coefficients, reduce caps, rely more on verified quality
SafetySystemic manipulation or validation failureSuspend uniqueness/contribution bonuses; use externally verified metrics only
EmergencySevere corruption or safety violationTemporarily exclude affected institution updates pending remediation

13. Validation Plan

FAIR-WEIGHTS-H should be compared against:

  1. equal weighting,
  2. volume weighting,
  3. prestige weighting,
  4. original multiplicative FAIR-WEIGHTS,
  5. Shapley/Owen-only attribution,
  6. FAIR-WEIGHTS-H.

Primary metrics should include:

  • global AUC,
  • balanced accuracy,
  • calibration error,
  • worst-group sensitivity,
  • false-negative-rate disparity,
  • subgroup non-inferiority,
  • convergence stability,
  • weight stability,
  • robustness under missingness and simulated gaming.

14. Regulatory-Safe Claim Language

Use:

FAIR-WEIGHTS-H enforces pre-specified representation and subgroup-performance constraints and requires prospective validation before clinical deployment.

Do not use:

FAIR-WEIGHTS-H proves fairness.

Use:

Shapley/Owen values provide an axiomatic counterfactual attribution benchmark whose conclusions depend on the chosen validation utility and reference distribution.

Research documentation. Not clinical validation or regulatory clearance.