Analysis Overview

This document summarizes what each analysis in this folder does and what it produces.

Runtime Defaults

Analysis defaults are centralized in analysis/defaults.py.
The orchestrator CLI in analysis/evaluate_models.py is intentionally minimal and focuses on operational controls (--backend, --run-name, --skip-noise).
Standalone operational modes include --longitudinal-breakdown-only, --noise-correlation-only, and --dataset-summary-only.
Threshold grids, uncertainty percentile grids, noise-factor grids, calibration bins, class index, decision threshold, and bayesian MC passes are sourced from analysis/defaults.py.

Shared Utilities

Shared loader/split logic is centralized in analysis/data_pipeline.py.
All plotting code is centralized in analysis/plotting.py for easier inspection and maintenance.

1. Performance Threshold Sweep

Purpose: Measure classification performance as decision threshold changes.
Inputs: Ground-truth labels and predicted probabilities.
Method: Evaluate metrics across an evenly spaced threshold grid.
Main outputs:
- performance_threshold_sweep.csv
- plots/performance_threshold_accuracy.png
- plots/performance_threshold_f1.png

2. Uncertainty Cutoff (Raw-Value)

Purpose: Evaluate performance on subsets with uncertainty below percentile-derived cutoff values.
Inputs: Uncertainty arrays (confidence-derived uncertainty and backend-specific uncertainty).
Method:
- Build evenly spaced percentile points.
- Convert each percentile to a raw uncertainty cutoff value.
- Keep samples where uncertainty is less than or equal to that cutoff.
- Compute accuracy and F1 for each retained subset.
Main outputs:
- performance_uncertainty_cutoff.csv
- plots/performance_uncertainty_cutoff_accuracy.png
- plots/performance_uncertainty_cutoff_f1.png

3. Uncertainty Cutoff (Percentile-Ranked)

Purpose: Evaluate performance from all samples toward only the lowest-uncertainty samples.
Inputs: Same uncertainty arrays as above.
Method:
- Use an evenly spaced percentile grid.
- Keep samples where uncertainty is less than or equal to the selected percentile cutoff.
- Plot from least restricted on the left (all samples) to most restricted on the right (lowest-uncertainty subset only).
Main outputs:
- performance_uncertainty_percentile_cutoff.csv
- plots/performance_uncertainty_percentile_cutoff_accuracy.png
- plots/performance_uncertainty_percentile_cutoff_f1.png

4. Calibration Analysis

Purpose: Quantify probability calibration quality.
Inputs: Ground-truth labels and predicted probabilities.
Method:
- Reliability binning with configurable bin count.
- Compute MCE and Brier score.
Main outputs:
- calibration_bins.csv
- plots/calibration_reliability.png

5. Physician Confidence Comparison

Purpose: Compare model uncertainty with physician confidence ratings.
Inputs: Evaluation outputs plus clinical table (Image Data ID + physician confidence column).
Method:
- Merge by image ID.
- Group metrics by physician confidence level.
- Plot distributions per rating group.
Main outputs include grouped summary CSV files and boxplots for confidence and standard deviation (ensemble) or predictive uncertainty (bayesian).

6. Longitudinal Stability Analysis

Purpose: Examine uncertainty patterns across stable and transitioning patient trajectories.
Inputs: Evaluation outputs and clinical timeline information.
Method:
- Build patient-level trajectories.
- Group by clinical cohort dynamics.
- Compare uncertainty summaries across cohorts.
Main outputs include patient/cohort summary CSV files and cohort uncertainty plots.

7. Noise Sensitivity Analysis

Purpose: Test robustness and uncertainty behavior under synthetic Gaussian noise.
Inputs: Holdout data loader, model backend, noise factor schedule, threshold, calibration bins.
Method:
- Use an evenly spaced noise factor schedule.
- Add Gaussian noise scaled by a fixed intensity-range factor.
- Recompute performance and calibration at each sigma.
- Save visual examples of noised images.
Main outputs:
- noise_sensitivity.csv
- plots/noise_sensitivity_accuracy.png
- plots/noise_sensitivity_f1.png
- plots/noise_confidence.png
- plots/noise_standard_deviation.png (ensemble) / plots/noise_predictive_uncertainty.png (bayesian)
- plots/noise_examples/*_noise_examples.png
- plots/noise_examples/*_clean_scan_example.png

8. Dataset Composition Summary

Purpose: Report dataset composition without rerunning full evaluation analyses.
Inputs: Raw dataset files and configured train/validation/test split ratios.
Method:
- Rebuild dataset and split assignment using the configured split seed.
- Count total images and positive/negative labels overall.
- Count train/validation/test images and per-split class balance.
- Compute both overall and split-level percentages.
Main outputs:
- dataset_summary.md
CLI:
- python -m analysis.evaluate_models --dataset-summary-only

Plot Report

Each backend output now includes plots_report.md.
The report embeds all generated plot images and includes a short description per plot.

Notes on Even Spacing

The pipeline uses evenly spaced grids for sampled x-axis points in threshold and uncertainty-cutoff performance plots, and for the sigma schedule used by noise sensitivity plots.

ANALYSES_OVERVIEW.md 5.4 KB 永久链接 文件历史 原始文件