Planning
As of now, we have a program set up to be able to:
- train an individual model with specific hyperparameters
- train a ensemble of models with the identical hyperparameters
- evaluate the accuracy of an ensemble of models
- perform a coverage analysis on an ensemble of models
The goal of this rewrite is to preserve those functions while making the program significantly cleaner and easier to use, and to make it easier to extend with new functionality in the future as well. The hope is for this project to take approximately ~1-2 days, and be completed by Monday (6/17). The additional features that I would like to implement are:
- Recording sets of models and ensembles and reporting their metrics (must be able to distinguish between individual models and ensembles)
- Better configurating settings, able to use waterfalling files (files in subdirectories override settings in main directory)
- Descriptive configuration (describe the desired models, ensembles and analysis in a config file and have that be produced on program run)
- Easier implementation of models and analysis (more use of classes and such)
- Implementation of new metrics and ensembles
- Deterministic dataloading (for a specified seed, the data used is set and does not change, even if the loading methods do)
Further Planning as of 7/8/24
- With the implementation of uncertainty through standard deviation, confidence and entropy, next steps are
- Refactor current threshold implementation - very very messy and difficult to add new features
- Enable checking images for incorrect prediction, and predictions off of the main curve for stdev-conf curve thing
- Investigate physician confidence, and compare to uncertianty predictions
- Deep dive standard deviation
- Box plot?
- Investigate calibration - do we need it?
- Consider manuscript - should be thinking about writing