FAIR-WEIGHTS-H PCam Federated Smoke Test Report

Date: 2026-05-22 Test Type: Smoke validation (plumbing check, NOT performance comparison) Data: Real PCam pathology patches (Camelyon16-derived) Sites: 5 simulated federated sites (1000 patches each) Rounds: 5 federated rounds per strategy Model: Simple CNN for patch-level classification

Executive Summary

✅ All 4 weighting strategies passed smoke validation on real PCam pathology data

This smoke test validates that the federated learning pipeline executes end-to-end on real pathology image patches with all four weighting strategies. This is an intermediate validation step between synthetic data and full multi-center Camelyon17 WSI validation.

What This Test Validates

✅ Real PCam pathology patches load correctly ✅ Simulated federated sites are created ✅ Local training completes on real medical images ✅ Aggregation completes for all strategies ✅ Weights are computed and logged ✅ Validation metrics are emitted ✅ Checkpoints are saved ✅ No NaN/Inf values detected ✅ FAIR-WEIGHTS-H backward compatibility with default settings

What This Test Does NOT Validate

❌ Real hospital-level heterogeneity (sites are simulated) ❌ True multi-center site shift (PCam is single-source Camelyon16) ❌ Slide-level clinical aggregation (patch-level only) ❌ Camelyon17-style domain generalization ❌ Performance superiority of FAIR-WEIGHTS-H (smoke test, not benchmark)

Test Configuration

Data Details

Dataset: PCam (PatchCamelyon)
Source: Camelyon16-derived pathology patches
Image Size: 96×96×3 RGB patches
Total Samples: 5000 patches (subset for smoke test)
Samples per Site: 1000 patches
Positive Rate: ~50% (balanced metastasis detection)
Data Type: Real histopathology images (NOT synthetic)

Federated Setup

Number of Sites: 5 simulated institutions
Site Split: Random stratified split (equal size)
Rounds: 5 federated rounds
Local Epochs: 1 epoch per round
Batch Size: 32
Learning Rate: 0.01
Optimizer: SGD

Model Architecture

python

SimplePCamCNN:
  - Conv2d(3, 32, kernel_size=3, padding=1) + ReLU + MaxPool2d(2,2)
  - Conv2d(32, 64, kernel_size=3, padding=1) + ReLU + MaxPool2d(2,2)
  - Linear(64*24*24, 128) + ReLU + Dropout(0.5)
  - Linear(128, 2)

Results Summary

Strategy	PCam Loaded	Sites Created	Training	Aggregation	Weights	Metrics	Checkpoints	NaNs	Global Acc	Weight Entropy	N_eff
equal	✓	✓	✓	✓	✓	✓	✓	✓ (none)	0.528	1.000	5.00
volume	✓	✓	✓	✓	✓	✓	✓	✓ (none)	0.528	1.000	5.00
prestige	✓	✓	✓	✓	✓	✓	✓	✓ (none)	0.528	0.998	4.98
fair_weights_h	✓	✓	✓	✓	✓	✓	✓	✓ (none)	0.528	1.000	5.00

Key Observations

All strategies executed successfully - No crashes, no NaNs, all validation checks passed
Consistent global accuracy - All strategies achieved ~52.8% accuracy (expected for 5-round smoke test)
Weight entropy near 1.0 - All strategies produced near-uniform weights (expected for balanced sites)
N_eff ≈ 5 - Effective number of sites close to actual number (healthy weight distribution)
FAIR-WEIGHTS-H backward compatibility confirmed - Default settings work as expected

Detailed Results by Strategy

1. Equal Weighting

Strategy: All sites weighted equally (1/K)

Final Weights (Round 5):

Site 0: 0.200
Site 1: 0.200
Site 2: 0.200
Site 3: 0.200
Site 4: 0.200

Metrics:

Global Accuracy: 0.528
Weight Entropy: 1.000 (maximum)
N_eff: 5.00 (all sites equally weighted)

Site-wise Accuracy:

Site 0: 0.550
Site 1: 0.530
Site 2: 0.534
Site 3: 0.514
Site 4: 0.513

Status: ✅ PASSED

2. Volume Weighting

Strategy: Sites weighted by dataset size (proportional to sample count)

Final Weights (Round 5):

Site 0: 0.200
Site 1: 0.200
Site 2: 0.200
Site 3: 0.200
Site 4: 0.200

Metrics:

Global Accuracy: 0.528
Weight Entropy: 1.000 (uniform due to equal site sizes)
N_eff: 5.00

Site-wise Accuracy:

Site 0: 0.550
Site 1: 0.530
Site 2: 0.534
Site 3: 0.514
Site 4: 0.513

Status: ✅ PASSED

Note: Weights are uniform because all sites have equal sample counts (1000 each). This is expected behavior.

3. Prestige Weighting

Strategy: Sites weighted by inverse error (higher accuracy → higher weight)

Final Weights (Round 5):

Site 0: 0.226
Site 1: 0.189
Site 2: 0.198
Site 3: 0.202
Site 4: 0.186

Metrics:

Global Accuracy: 0.528
Weight Entropy: 0.998 (slightly non-uniform)
N_eff: 4.98

Site-wise Accuracy:

Site 0: 0.550 (highest accuracy → highest weight)
Site 1: 0.530
Site 2: 0.534
Site 3: 0.514
Site 4: 0.513

Status: ✅ PASSED

Note: Prestige correctly assigns higher weight to Site 0 (best accuracy) and lower weights to Sites 1 and 4 (lower accuracy).

4. FAIR-WEIGHTS-H

Strategy: Fairness-aware weighting combining quality, volume, and fairness

Final Weights (Round 5):

Site 0: 0.196
Site 1: 0.202
Site 2: 0.200
Site 3: 0.199
Site 4: 0.203

Metrics:

Global Accuracy: 0.528
Weight Entropy: 1.000 (near-uniform)
N_eff: 5.00

Site-wise Accuracy:

Site 0: 0.550
Site 1: 0.530
Site 2: 0.534
Site 3: 0.514
Site 4: 0.513

Status: ✅ PASSED

Note: FAIR-WEIGHTS-H produces near-uniform weights for this balanced scenario, demonstrating backward compatibility with default settings (score_transform="linear", update_rule="softmax", beta=1.0, eta=1.0).

FAIR-WEIGHTS-H Mathematical Modes Validation

Unit Tests Status

✅ All 8 FAIR-WEIGHTS-H unit tests passed:

test_weights_sum_to_one ✓
test_entropy_and_effective_count_are_reported ✓
test_higher_uncertainty_lowers_weight_all_else_equal ✓
test_integrity_gate_excludes_institution_nearly_entirely ✓
test_duplicate_institution_ids_raise ✓
test_empty_input_raises ✓
test_conservative_mode_changes_weights ✓
test_invalid_config_raises ✓

Default Settings Validated

The PCam smoke test validates FAIR-WEIGHTS-H with default backward-compatible settings:

score_transform="linear" (default)
update_rule="softmax" (default)
beta=1.0 (default)
eta=1.0 (default)

New Mathematical Modes (Not Yet Validated in Smoke Test)

The following new modes were added AFTER the initial PCam smoke test and require separate validation:

Beta Parameter (beta):
- Controls entropy of weight distribution
- beta=0 → near-uniform weights
- Higher beta → lower entropy (more concentrated weights)
Log-Linear Scoring (score_transform="log_linear"):
- Recovers normalized product-style weighting
- Alternative to linear scoring
Mirror Descent Update (update_rule="mirror_descent"):
- Respects previous weights when previous_weights are nonuniform
- Matches softmax when previous_weights are uniform and eta=1

Next Steps for New Modes:

Add command-line flags to smoke test script (--score-transform, --update-rule, --beta, --eta)
Run smoke tests with new mathematical modes
Document behavior differences

Validation Ladder Progress

Stage	Status	Description
1. Synthetic Camelyon17-like smoke	✅ COMPLETE	FL plumbing works on synthetic data
2. PCam federated smoke (equal)	✅ COMPLETE	FL works on real pathology patches
3. PCam federated smoke (volume, prestige, fair_weights_h)	✅ COMPLETE	All strategies work on real patches
4. PCam federated benchmark	⏭️ NEXT	Full performance comparison (20-50 rounds, multiple seeds)
5. Real Camelyon17 subset smoke	⏭️ FUTURE	True multi-center WSI pipeline validation
6. Real Camelyon17 full validation	⏭️ FUTURE	Actual multi-center clinical validation

Technical Notes

PCam vs Camelyon17 Distinction

PCam (This Test):

Real pathology patches from Camelyon16
Patch-level classification (96×96 pixels)
Single-source data (no true multi-center heterogeneity)
Simulated federated sites (random split)
Validates: data loading, FL execution, real image tensors

Camelyon17 (Future Test):

Real multi-center WSI data from 5 hospitals
Slide-level aggregation required
True hospital-level domain shift
Real institutional heterogeneity
Validates: clinical generalization, site robustness

Smoke Test Philosophy

This is a plumbing validation, not a performance benchmark:

Goal: Does the pipeline run end-to-end without errors?
NOT Goal: Does FAIR-WEIGHTS-H outperform other strategies?

5 rounds is sufficient to verify execution, but insufficient for performance claims.

Memory Optimization

The smoke test uses several optimizations for efficiency:

Memory-mapped loading (mmap_mode='r')
5000-sample subset (1000 per site)
CPU-only training
Single epoch per round

Conclusions

✅ Smoke Test Success

All four weighting strategies successfully executed on real PCam pathology patches:

Equal weighting - Baseline uniform weighting works
Volume weighting - Dataset-size-based weighting works
Prestige weighting - Accuracy-based weighting works
FAIR-WEIGHTS-H - Fairness-aware weighting works with default settings

✅ FAIR-WEIGHTS-H Backward Compatibility

The default FAIR-WEIGHTS-H settings (linear scoring, softmax update, beta=1.0, eta=1.0) are backward-compatible and produce expected behavior on balanced data.

✅ Real Pathology Data Validation

The federated pipeline successfully processes real histopathology images, demonstrating readiness for medical imaging applications.

⏭️ Next Steps

PCam Federated Benchmark (20-50 rounds, multiple seeds):
- Compare global AUC across strategies
- Measure site-wise AUC and worst-site sensitivity
- Compute weight entropy and N_eff over time
- Evaluate calibration (ECE)
New Mathematical Modes Validation:
- Add CLI flags for --score-transform, --update-rule, --beta, --eta
- Run smoke tests with log_linear and mirror_descent modes
- Document behavior differences
Real Camelyon17 Validation:
- Smoke test on real multi-center WSI data
- Full validation with slide-level aggregation
- True hospital-level heterogeneity evaluation

Disclaimer

This smoke test uses real PCam pathology patches with simulated federated sites. It validates data-loading and federated execution on real pathology images, but it is not real multi-institutional Camelyon17 validation and should not be interpreted as clinical evidence or proof of FAIR-WEIGHTS-H superiority.

The 5-round smoke test is a plumbing check, not a performance benchmark. Full performance comparison requires 20-50 rounds with multiple seeds and comprehensive metrics.

References

PCam Dataset: PatchCamelyon on GitHub
Camelyon16: Original source for PCam patches
Camelyon17: Target for future multi-center validation
FAIR-WEIGHTS-H: Fairness-aware federated weighting algorithm

Report Generated: 2026-05-22 Test Duration: ~3 minutes per strategy (~12 minutes total) Test Environment: Windows, Python 3.14.3, PyTorch, CPU-only Test Script: scripts/federated/run_pcam_federated_smoke.pyResults Location: results/pcam_federated_smoke/

FAIR-WEIGHTS-H PCam Federated Smoke Test Report ​

Executive Summary ​

What This Test Validates ​

What This Test Does NOT Validate ​

Test Configuration ​

Data Details ​

Federated Setup ​

Model Architecture ​

Results Summary ​

Key Observations ​

Detailed Results by Strategy ​

1. Equal Weighting ​

2. Volume Weighting ​

3. Prestige Weighting ​

4. FAIR-WEIGHTS-H ​

FAIR-WEIGHTS-H Mathematical Modes Validation ​

Unit Tests Status ​

Default Settings Validated ​

New Mathematical Modes (Not Yet Validated in Smoke Test) ​

Validation Ladder Progress ​

Technical Notes ​

PCam vs Camelyon17 Distinction ​

Smoke Test Philosophy ​

Memory Optimization ​

Conclusions ​

✅ Smoke Test Success ​

✅ FAIR-WEIGHTS-H Backward Compatibility ​

✅ Real Pathology Data Validation ​

⏭️ Next Steps ​

Disclaimer ​

References ​

FAIR-WEIGHTS-H PCam Federated Smoke Test Report

Executive Summary

What This Test Validates

What This Test Does NOT Validate

Test Configuration

Data Details

Federated Setup

Model Architecture

Results Summary

Key Observations

Detailed Results by Strategy

1. Equal Weighting

2. Volume Weighting

3. Prestige Weighting

4. FAIR-WEIGHTS-H

FAIR-WEIGHTS-H Mathematical Modes Validation

Unit Tests Status

Default Settings Validated

New Mathematical Modes (Not Yet Validated in Smoke Test)

Validation Ladder Progress

Technical Notes

PCam vs Camelyon17 Distinction

Smoke Test Philosophy

Memory Optimization

Conclusions

✅ Smoke Test Success

✅ FAIR-WEIGHTS-H Backward Compatibility

✅ Real Pathology Data Validation

⏭️ Next Steps

Disclaimer

References