|
|
2 месяцев назад | |
|---|---|---|
| .. | ||
| ANALYSES_OVERVIEW.md | 2 месяцев назад | |
| README.md | 2 месяцев назад | |
| __init__.py | 2 месяцев назад | |
| analysis_modules.py | 2 месяцев назад | |
| data_access.py | 2 месяцев назад | |
| data_pipeline.py | 2 месяцев назад | |
| defaults.py | 2 месяцев назад | |
| evaluate_models.py | 2 месяцев назад | |
| holdout_evaluation.py | 2 месяцев назад | |
| metrics.py | 2 месяцев назад | |
| model_utils.py | 2 месяцев назад | |
| noise_analysis.py | 2 месяцев назад | |
| plotting.py | 2 месяцев назад | |
| runtime.py | 2 месяцев назад | |
| uncertainty.py | 2 месяцев назад | |
This folder should contain all the necessary code for the evaluation of the Bayeisan model and the Deep Ensemble. It should generate and save graphs and statistics for this. The included analyses should include
"Uncertainty" should be taken to mean EITHER confidence or standard deviation for the ensemble and the bayesian network (i.e. the raw outputs distance from 0.5 or either the stdev from the Bayesian or the stdev of all the model outputs). Analyses should be evaluated with both.
The code in the senior_research_thesis and the prebious alnn_rewrite/analysis folders should be consulted. The models should be loaded from the currently in-place config.toml
The modular implementation now lives entirely in this folder.
alnn_rewrite/analysisalnn_rewrite/analysis_outputIf a backend does not already have model_evaluation_results.nc, the pipeline now automatically evaluates that backend on the holdout datasets (validation + test) first, saves the generated netCDF into that backend's model output directory, and then runs the analyses.
Uncertainty analyses are now run using both:
2 * |p - 0.5| where 0 means very uncertain and 1 means very certain1 - 2 * |p - 0.5| so larger values mean more uncertainty, matching std-based plotsbayesian_torch.utils.util.predictive_entropy)evaluate_models.py: Orchestrator CLI for running selected analyses across ensemble and bayesian backends.defaults.py: Centralized analysis defaults (threshold grid, noise factors, calibration bins, class index, decision threshold, bayesian MC passes).data_pipeline.py: Shared analysis data loading and split/loader construction utilities.plotting.py: Shared plotting functions for all analysis visualizations.uncertainty.py: Shared confidence certainty/uncertainty transforms.model_utils.py: Shared model behavior utilities (including bayesian sampling mode setup).runtime.py: Runtime paths and JSON helpers.data_access.py: netCDF loading, class probability extraction, and clinical table access.metrics.py: Shared performance and calibration metrics.analysis_modules.py: Performance, calibration, physician confidence, and longitudinal analyses.noise_analysis.py: Evaluation-time Gaussian noise sensitivity analysis.From alnn_rewrite:
python -m analysis.evaluate_models
Useful options:
python -m analysis.evaluate_models \
--backend ensemble bayesian \
--run-name first_modular_run
If you want to skip noise analysis while validating the pipeline:
python -m analysis.evaluate_models --skip-noise
Most analysis tuning parameters are now intentionally centralized in analysis/defaults.py to reduce CLI verbosity and improve maintainability.
Each run creates a dedicated directory:
alnn_rewrite/analysis_output/
run_YYYYMMDD_HHMMSS/
run_manifest.json
ensemble/
backend_summary.json
plots_report.md
performance_threshold_sweep.csv
calibration_bins.csv
performance_uncertainty_cutoff.csv
performance_uncertainty_percentile_cutoff.csv
physician_grouped_metrics.csv
physician_confidence_grouped_metrics.csv
physician_std_grouped_metrics.csv
longitudinal_patient_summary.csv
longitudinal_cohort_summary.csv
longitudinal_uncertainty_by_cohort.csv
longitudinal_confidence_patient_summary.csv
longitudinal_std_patient_summary.csv
longitudinal_confidence_cohort_summary.csv
longitudinal_std_cohort_summary.csv
noise_sensitivity.csv
plots/
performance_threshold_sweep.png
calibration_reliability.png
performance_uncertainty_cutoff.png
performance_uncertainty_percentile_cutoff.png
physician_confidence_boxplot.png
physician_std_boxplot.png
longitudinal_cohort_confidence.png
longitudinal_cohort_std.png
noise_sensitivity.png
noise_uncertainty.png
noise_confidence_certainty.png
noise_examples/
ensemble_noise_examples.png
bayesian_noise_examples.png
bayesian/
... (same structure)
The default noise schedule now includes noisier settings beyond the earlier small-sigma cases, so the saved example images show a clearer progression from lightly noised to heavily noised volumes.
Performance analysis now includes two uncertainty cutoff views:
For the noise analysis, the uncertainty plot uses the confidence metric in its uncertainty orientation, so higher values always mean more uncertainty. A separate certainty plot is also saved for direct inspection of 2 * |p - 0.5|.
Noise plots now label the x-axis as Gaussian Noise Factor to reflect that values are multipliers on the actual noise sigma.