Nicholas Schense a9e15ca5e4 Changed to using a UV project to simplify project, and changed computers 21 часов назад
..
ANALYSES_OVERVIEW.md 810cef2630 Currently working! 1 месяц назад
README.md 810cef2630 Currently working! 1 месяц назад
__init__.py 2e4c5b386d updates to analysis! 2 месяцев назад
analysis_modules.py 810cef2630 Currently working! 1 месяц назад
data_access.py e79e7f50c2 more analysis work - fixed graphs, going to implement more noise + stdev analysis 6 дней назад
data_pipeline.py 1cae771e45 Work on analysis (with AI) 2 месяцев назад
dataset_summary.py a9e15ca5e4 Changed to using a UV project to simplify project, and changed computers 21 часов назад
defaults.py 1cae771e45 Work on analysis (with AI) 2 месяцев назад
evaluate_models.py a9e15ca5e4 Changed to using a UV project to simplify project, and changed computers 21 часов назад
holdout_evaluation.py e79e7f50c2 more analysis work - fixed graphs, going to implement more noise + stdev analysis 6 дней назад
longitudinal_audit.py 810cef2630 Currently working! 1 месяц назад
metrics.py 810cef2630 Currently working! 1 месяц назад
model_utils.py 1cae771e45 Work on analysis (with AI) 2 месяцев назад
noise_analysis.py e79e7f50c2 more analysis work - fixed graphs, going to implement more noise + stdev analysis 6 дней назад
noise_correlation.py 810cef2630 Currently working! 1 месяц назад
plotting.py e79e7f50c2 more analysis work - fixed graphs, going to implement more noise + stdev analysis 6 дней назад
regenerate_plots.py a9e15ca5e4 Changed to using a UV project to simplify project, and changed computers 21 часов назад
runtime.py 2e4c5b386d updates to analysis! 2 месяцев назад
uncertainty.py 1cae771e45 Work on analysis (with AI) 2 месяцев назад

README.md

Model Evaluation Code

Description

This folder should contain all the necessary code for the evaluation of the Bayeisan model and the Deep Ensemble. It should generate and save graphs and statistics for this. The included analyses should include

  • Performance information (i.e. basic information on accuracy, number correct, number incorrect, F1 score, etc.)
  • Some basic metrics for uncertainty (MCE, Brier)
  • Physican Confidence Analysis (graphing uncertainty vs physican confidence)
  • Longitudinal Analysis (graphing uncertainty on patients who remained stable CN, stable AD, or switched from CN to AD)
  • Noise introduction analysis (graphing uncertainty on normal image and images with increasing levels of Gaussian noise applied)

"Uncertainty" should be taken to mean EITHER confidence or standard deviation for the ensemble and the bayesian network (i.e. the raw outputs distance from 0.5 or either the stdev from the Bayesian or the stdev of all the model outputs). Analyses should be evaluated with both.

The code in the senior_research_thesis and the prebious alnn_rewrite/analysis folders should be consulted. The models should be loaded from the currently in-place config.toml

Implementation Status

The modular implementation now lives entirely in this folder.

  • All new source code is under alnn_rewrite/analysis
  • All generated artifacts are written under alnn_rewrite/analysis_output

If a backend does not already have model_evaluation_results.nc, the pipeline now automatically evaluates that backend on the holdout datasets (validation + test) first, saves the generated netCDF into that backend's model output directory, and then runs the analyses.

Uncertainty analyses are now run using both:

  • Confidence
  • Standard deviation (ensemble)
  • Predictive uncertainty (bayesian; computed via bayesian_torch.utils.util.predictive_entropy)

Current Modules

  • evaluate_models.py: Orchestrator CLI for running selected analyses across ensemble and bayesian backends.
  • defaults.py: Centralized analysis defaults (threshold grid, noise factors, calibration bins, class index, decision threshold, bayesian MC passes).
  • data_pipeline.py: Shared analysis data loading and split/loader construction utilities.
  • plotting.py: Shared plotting functions for all analysis visualizations.
  • uncertainty.py: Shared confidence certainty/uncertainty transforms.
  • model_utils.py: Shared model behavior utilities (including bayesian sampling mode setup).
  • runtime.py: Runtime paths and JSON helpers.
  • data_access.py: netCDF loading, class probability extraction, and clinical table access.
  • metrics.py: Shared performance and calibration metrics.
  • analysis_modules.py: Performance, calibration, physician confidence, and longitudinal analyses.
  • noise_analysis.py: Evaluation-time Gaussian noise sensitivity analysis.

How To Run

From alnn_rewrite:

python -m analysis.evaluate_models

Useful options:

python -m analysis.evaluate_models \
	--backend ensemble bayesian \
	--run-name first_modular_run

If you want to skip noise analysis while validating the pipeline:

python -m analysis.evaluate_models --skip-noise

If you want to quickly investigate longitudinal cohort breakdown only (without rerunning all analyses):

python -m analysis.evaluate_models --longitudinal-breakdown-only

If you want to rerun only the noise uncertainty-vs-accuracy regression/correlation analysis from existing noise CSV outputs:

python -m analysis.evaluate_models --noise-correlation-only --run-name run_YYYYMMDD_HHMMSS

If you want to generate only dataset composition documentation (total images, class counts, train/validation/test counts, and percentage breakdowns) without rerunning analyses:

python -m analysis.evaluate_models --dataset-summary-only

This writes dataset_summary.md into the selected run directory under analysis_output.

Most analysis tuning parameters are now intentionally centralized in analysis/defaults.py to reduce CLI verbosity and improve maintainability.

Output Layout

Each run creates a dedicated directory:

alnn_rewrite/analysis_output/
	run_YYYYMMDD_HHMMSS/
		run_manifest.json
		ensemble/
			backend_summary.json
			plots_report.md
			performance_threshold_sweep.csv
			calibration_bins.csv
			performance_uncertainty_cutoff.csv
			performance_uncertainty_percentile_cutoff.csv
			physician_grouped_metrics.csv
			physician_confidence_grouped_metrics.csv
			physician_std_grouped_metrics.csv
			longitudinal_patient_summary.csv
			longitudinal_cohort_summary.csv
			longitudinal_uncertainty_by_cohort.csv
			longitudinal_confidence_patient_summary.csv
			longitudinal_std_patient_summary.csv
			longitudinal_confidence_cohort_summary.csv
			longitudinal_std_cohort_summary.csv
			noise_sensitivity.csv
			noise_accuracy_uncertainty_stats.csv
			noise_accuracy_uncertainty_summary.md
			plots/
				performance_threshold_accuracy.png
				performance_threshold_f1.png
				calibration_reliability.png
				performance_uncertainty_cutoff_accuracy.png
				performance_uncertainty_cutoff_f1.png
				performance_uncertainty_percentile_cutoff_accuracy.png
				performance_uncertainty_percentile_cutoff_f1.png
				physician_confidence_boxplot.png
				physician_std_boxplot.png
				longitudinal_cohort_confidence.png
				longitudinal_cohort_std.png
				noise_sensitivity_accuracy.png
				noise_sensitivity_f1.png
				noise_confidence.png
				noise_standard_deviation.png (ensemble) / noise_predictive_uncertainty.png (bayesian)
				noise_accuracy_uncertainty_2d.png
				noise_examples/
					ensemble_noise_examples.png
					bayesian_noise_examples.png
					ensemble_clean_scan_example.png
					bayesian_clean_scan_example.png
		bayesian/
			... (same structure)

The default noise schedule now includes noisier settings beyond the earlier small-sigma cases, so the saved example images show a clearer progression from lightly noised to heavily noised volumes.

Performance analysis now includes two uncertainty cutoff views:

  • A raw-value cutoff sweep that keeps samples with uncertainty below the percentile-derived cutoff value.
  • A percentile-ranked cutoff sweep that is plotted from least restricted (left, all samples) to most restricted (right, lowest-uncertainty subset).

Noise analysis now saves individual series plots (accuracy, F1, confidence, and standard deviation or predictive uncertainty) so each metric is shown on its own graph.

Noise plots now label the x-axis as Gaussian Noise Factor to reflect that values are multipliers on the actual noise sigma.