ANALYSES_OVERVIEW.md 4.3 KB

Analysis Overview

This document summarizes what each analysis in this folder does and what it produces.

Runtime Defaults

  • Analysis defaults are centralized in analysis/defaults.py.
  • The orchestrator CLI in analysis/evaluate_models.py is intentionally minimal and focuses on operational controls (--backend, --run-name, --skip-noise).
  • Threshold grids, uncertainty percentile grids, noise-factor grids, calibration bins, class index, decision threshold, and bayesian MC passes are sourced from analysis/defaults.py.

Shared Utilities

  • Shared loader/split logic is centralized in analysis/data_pipeline.py.
  • All plotting code is centralized in analysis/plotting.py for easier inspection and maintenance.

1. Performance Threshold Sweep

  • Purpose: Measure classification performance as decision threshold changes.
  • Inputs: Ground-truth labels and predicted probabilities.
  • Method: Evaluate metrics across an evenly spaced threshold grid.
  • Main outputs:
    • performance_threshold_sweep.csv
    • plots/performance_threshold_sweep.png

2. Uncertainty Cutoff (Raw-Value)

  • Purpose: Evaluate performance on subsets with uncertainty below percentile-derived cutoff values.
  • Inputs: Uncertainty arrays (confidence-derived uncertainty and backend-specific uncertainty).
  • Method:
    • Build evenly spaced percentile points.
    • Convert each percentile to a raw uncertainty cutoff value.
    • Keep samples where uncertainty is less than or equal to that cutoff.
    • Compute accuracy and F1 for each retained subset.
  • Main outputs:
    • performance_uncertainty_cutoff.csv
    • plots/performance_uncertainty_cutoff.png

3. Uncertainty Cutoff (Percentile-Ranked)

  • Purpose: Evaluate performance from all samples toward only the lowest-uncertainty samples.
  • Inputs: Same uncertainty arrays as above.
  • Method:
    • Use an evenly spaced percentile grid.
    • Keep samples where uncertainty is less than or equal to the selected percentile cutoff.
    • Plot from least restricted on the left (all samples) to most restricted on the right (lowest-uncertainty subset only).
  • Main outputs:
    • performance_uncertainty_percentile_cutoff.csv
    • plots/performance_uncertainty_percentile_cutoff.png

4. Calibration Analysis

  • Purpose: Quantify probability calibration quality.
  • Inputs: Ground-truth labels and predicted probabilities.
  • Method:
    • Reliability binning with configurable bin count.
    • Compute ECE, MCE, and Brier score.
  • Main outputs:
    • calibration_bins.csv
    • plots/calibration_reliability.png

5. Physician Confidence Comparison

  • Purpose: Compare model uncertainty with physician confidence ratings.
  • Inputs: Evaluation outputs plus clinical table (Image Data ID + physician confidence column).
  • Method:
    • Merge by image ID.
    • Group metrics by physician confidence level.
    • Plot distributions per rating group.
  • Main outputs include grouped summary CSV files and boxplots for confidence and secondary uncertainty metrics.

6. Longitudinal Stability Analysis

  • Purpose: Examine uncertainty patterns across stable and transitioning patient trajectories.
  • Inputs: Evaluation outputs and clinical timeline information.
  • Method:
    • Build patient-level trajectories.
    • Group by clinical cohort dynamics.
    • Compare uncertainty summaries across cohorts.
  • Main outputs include patient/cohort summary CSV files and cohort uncertainty plots.

7. Noise Sensitivity Analysis

  • Purpose: Test robustness and uncertainty behavior under synthetic Gaussian noise.
  • Inputs: Holdout data loader, model backend, noise factor schedule, threshold, calibration bins.
  • Method:
    • Use an evenly spaced noise factor schedule.
    • Add Gaussian noise scaled by a fixed intensity-range factor.
    • Recompute performance and calibration at each sigma.
    • Save visual examples of noised images.
  • Main outputs:
    • noise_sensitivity.csv
    • plots/noise_sensitivity.png
    • plots/noise_uncertainty.png
    • plots/noise_confidence_certainty.png
    • plots/noise_examples/*_noise_examples.png

Plot Report

  • Each backend output now includes plots_report.md.
  • The report embeds all generated plot images and includes a short description per plot.

Notes on Even Spacing

The pipeline uses evenly spaced grids for sampled x-axis points in threshold and uncertainty-cutoff performance plots, and for the sigma schedule used by noise sensitivity plots.