FAIR-WEIGHTS-H: Hybrid Institutional Weighting Protocol

Status: Proposed protocol and implementation specification
Scope: Federated computational pathology research system
Validation status: Empirically tested for execution stability and aggregation behavior; performance/fairness advantage over simpler baselines not yet demonstrated

1. Motivation

The current prestige-style institutional weighting pattern assigns higher influence to cancer centers and lower influence to rural or community hospitals based on institutional category. That is not scientifically defensible for federated pathology because institutional reputation is not the same as measured contribution, clinical reliability, subgroup coverage, or domain-shift value.

FAIR-WEIGHTS-H replaces prestige weighting with an auditable hybrid protocol that combines:

counterfactual contribution estimation,
diagnostic and process quality,
distributional uniqueness,
representation and subgroup-safety constraints,
uncertainty penalties,
anomaly monitoring and fallback modes.

The framework is intended as a research protocol. It does not claim clinical validation or regulatory clearance.

2. Key Design Principle

A single scalar institution weight cannot by itself guarantee fairness or safety. Therefore FAIR-WEIGHTS-H separates three concepts:

w_{i}^{train} \neq w_{i}^{val} \neq w_{i}^{monitor}

where:

$w_{i}^{train}$ : aggregation weight used during federated model updates,
$w_{i}^{val}$ : validation representation priority,
$w_{i}^{monitor}$ : post-market or research monitoring priority.

A site may have a low training weight because of uncertain or noisy updates while still receiving high validation and monitoring priority if it represents an underserved or clinically important population.

3. Institutional Signals

For institution $i$ , define the feature vector:

z_{i} = [A_{i}^{adj}, Q_{i}, ϕ_{i}^{Owen}, J S_{i}, F_{i}, V_{i}, - S_{i}]

where:

Symbol	Meaning	Status
$A_{i}^{adj}$	Difficulty-adjusted reference-case diagnostic quality	Proposed
$Q_{i}$	Process and pathology quality composite	Proposed
$ϕ_{i}^{Owen}$	Group-aware counterfactual contribution estimate	Proposed
$J S_{i}$	Jensen-Shannon distributional uniqueness	Proposed
$F_{i}$	Underserved-population representation score	Proposed
$V_{i}$	Bounded/sublinear volume factor	Proposed
$S_{i}$	Uncertainty, instability, or anomaly penalty	Proposed

Gradient alignment with the current global model is not used as a primary contribution factor because it can encode status quo bias. It may be used only for anomaly detection and drift monitoring.

4. Integrity Gate

FAIR-WEIGHTS-H uses a hard gate only for integrity and safety failures, not for prestige or raw quality.

I_{i} = G_{i}^{data} G_{i}^{label} G_{i}^{safety}

where:

$G_{i}^{data} = 1$ only when data integrity checks pass,
$G_{i}^{label} = 1$ only when no severe label-corruption signal is detected,
$G_{i}^{safety} = 1$ only when there is no active safety violation.

Low resource level, rural status, or case difficulty must not directly trigger exclusion. Quality should be modeled with difficulty adjustment and uncertainty, not with a crude hard threshold.

5. Difficulty-Adjusted Quality

Raw accuracy on reference cases can be misleading when institutions serve different case mixes. FAIR-WEIGHTS-H therefore requires a pre-specified difficulty-adjustment model before using adjusted quality in weight computation.

One defensible model is:

Y_{i j} \sim Bernoulli (p_{i j})

logit (p_{i j}) = α + β^{⊤} X_{i j} + b_{i}

where:

$Y_{i j}$ : correct or incorrect diagnosis for case $j$ at site $i$ ,
$X_{i j}$ : tumor type, stage, slide quality, stain quality, scanner metadata, referral status, and case complexity,
$b_{i}$ : institution-level effect after adjustment.

The adjusted quality is evaluated on a standardized reference distribution:

A_{i}^{adj} = E_{X \sim P_{ref}} [Pr (Y = 1 ∣ X, i)]

This adjustment must be calibrated and audited. It cannot simply be asserted.

6. Useful Uniqueness

Distributional uniqueness alone is not always beneficial. A site can be unique because it serves rare populations, but it can also be unique because of scanner artifacts, poor fixation, or systematic labeling errors.

Therefore uniqueness is treated as a weak signal unless paired with quality and subgroup utility:

D_{i}^{useful} = J S_{i} \cdot A_{i}^{adj} \cdot U_{i}^{subgroup}

where $U_{i}^{subgroup}$ measures whether the institution improves performance on the subgroup, morphology, or cancer subtype it uniquely represents.

7. Counterfactual Contribution

The preferred contribution signal is not local gradient alignment. It is counterfactual marginal contribution:

ϕ_{i} = E_{S \subseteq N ∖ {i}} [U (S \cup {i}) - U (S)]

For production feasibility, FAIR-WEIGHTS-H estimates this with grouped or multi-membership Owen-style sampling:

{\hat{ϕ}}_{i}^{Owen} = \sum_{g} m_{i g} {\hat{ϕ}}_{i ∣ g}

where $m_{i g} \in [0, 1]$ allows institutions to belong partly to multiple groups, such as academic, rural-serving, specialty center, or network-affiliated.

This avoids forcing ambiguous hospitals into a single administrative category.

8. Training Weight Objective

The quarterly training weights are produced by a constrained optimization problem:

w_{t} = \arg max_{w \in W} \sum_{i = 1}^{K} w_{i} ({\hat{ϕ}}_{i, t}^{Owen} + λ_{D} D_{i, t}^{useful} + λ_{F} F_{i, t} + λ_{Q} Q_{i, t} - λ_{S} S_{i, t})

subject to:

\sum_{i} w_{i} = 1

w_{i}^{min} \leq w_{i} \leq w_{i}^{max}

C_{g} (w) \geq C_{g}^{min} \forall g \in G_{underserved}

{Perf}_{g} (w) \geq {Perf}_{g}^{min} \forall g \in G_{clinical}

| w_{i, t} - w_{i, t - 1} | \leq Δ_{i}

The fairness and subgroup safety requirements are constraints, not optional score boosts.

9. Empirical Status

FAIR-WEIGHTS-H has been empirically tested for execution stability and aggregation behavior in this repository.

Completed checks include:

synthetic Camelyon-like smoke validation,
PCam federated smoke validation,
PCam all-strategy smoke validation,
PCam balanced federated benchmark,
PCam heterogeneous federated benchmark.

These tests show that FAIR-WEIGHTS-H runs without numerical failure, produces distinct weight trajectories under heterogeneous simulated sites, and does not degrade performance in the current patch-level PCam setup. They do not yet show a consistent performance or fairness advantage over simpler aggregation baselines. That stronger claim requires the planned ablation and slide-level multi-center validation.

10. Quarterly Algorithm

Collect privacy-preserving aggregate signals from institutions.
Run integrity checks and anomaly detection.
Estimate difficulty-adjusted quality and uncertainty.
Estimate distributional uniqueness using aggregate profiles.
Estimate approximate Owen contribution using sampled warm-started coalitions.
Compute useful uniqueness.
Solve the constrained weight optimization problem.
Run subgroup safety and representation audits.
Produce an institution-level report card.
Version the weights, coefficients, inputs, and audit results.

11. Anomaly Monitoring

Continuous or between-quarter monitoring should detect:

scanner drift,
stain distribution shift,
abrupt gradient drift,
local validation collapse,
unusual update variance,
suspicious simultaneous score changes among affiliated institutions,
distributional specialization that reduces coverage of community or underserved cases.

If severe anomalies are detected, weights should be throttled or frozen pending review.

12. Fallback Modes

FAIR-WEIGHTS-H should degrade safely rather than continue normal operation during instability.

Mode	Trigger	Action
Normal	No major anomaly	Full hybrid weighting
Conservative	Moderate anomaly or high uncertainty	Freeze coefficients, reduce caps, rely more on verified quality
Safety	Systemic manipulation or validation failure	Suspend uniqueness/contribution bonuses; use externally verified metrics only
Emergency	Severe corruption or safety violation	Temporarily exclude affected institution updates pending remediation

13. Validation Plan

FAIR-WEIGHTS-H should be compared against:

equal weighting,
volume weighting,
prestige weighting,
original multiplicative FAIR-WEIGHTS,
Shapley/Owen-only attribution,
FAIR-WEIGHTS-H.

Primary metrics should include:

global AUC,
balanced accuracy,
calibration error,
worst-group sensitivity,
false-negative-rate disparity,
subgroup non-inferiority,
convergence stability,
weight stability,
robustness under missingness and simulated gaming.

14. Regulatory-Safe Claim Language

Use:

FAIR-WEIGHTS-H enforces pre-specified representation and subgroup-performance constraints and requires prospective validation before clinical deployment.

Do not use:

FAIR-WEIGHTS-H proves fairness.

Use:

Shapley/Owen values provide an axiomatic counterfactual attribution benchmark whose conclusions depend on the chosen validation utility and reference distribution.

FAIR-WEIGHTS-H: Hybrid Institutional Weighting Protocol ​

1. Motivation ​

2. Key Design Principle ​

3. Institutional Signals ​

4. Integrity Gate ​

5. Difficulty-Adjusted Quality ​

6. Useful Uniqueness ​

7. Counterfactual Contribution ​

8. Training Weight Objective ​

9. Empirical Status ​

10. Quarterly Algorithm ​

11. Anomaly Monitoring ​

12. Fallback Modes ​

13. Validation Plan ​

14. Regulatory-Safe Claim Language ​

FAIR-WEIGHTS-H: Hybrid Institutional Weighting Protocol

1. Motivation

2. Key Design Principle

3. Institutional Signals

4. Integrity Gate

5. Difficulty-Adjusted Quality

6. Useful Uniqueness

7. Counterfactual Contribution

8. Training Weight Objective

9. Empirical Status

10. Quarterly Algorithm

11. Anomaly Monitoring

12. Fallback Modes

13. Validation Plan

14. Regulatory-Safe Claim Language