README 4.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
  1. This package is a self-contained implementation of functional decomposition, an
  2. unbinned, parametric solution for fitting mass spectra, conducting searches and
  3. producing limits. FD decomposes a dataset into a complete set of orthogonal
  4. functions (the orthonormal exponentials), whose coefficients can be extracted
  5. from the data by direct computation. It uses a penalized likelihood to
  6. determine the appropriate number of terms to retain from the infinite series.
  7. -------------------------
  8. RUNNING THE EXAMPLE CASES
  9. -------------------------
  10. There are three examples provided, with increasing levels of complexity. These
  11. are:
  12. bkg_only Decompose a single smooth spectrum using the orthonormal
  13. exponentials.
  14. sig_bkg Decompose a smooth spectrum with two known resonances. Use
  15. the orthonormal exponentials as a background model and two
  16. Gaussians to model the two known peaks.
  17. sig_bkg_scan Decompose a smooth spectrum with two known resonances, and
  18. perform a search for a new resonances. Use the orthonormal
  19. exponentials for the background, two Gaussians to model the
  20. two known peaks, and scan a third peak through several masses
  21. and widths.
  22. The first two examples should run in only a few minutes. The third example
  23. tests some ~600 different signal hypotheses, and can take rather longer. On
  24. my Dell XPS13 9350, using a fairly large sample of 5e7 events, this example
  25. takes about 1.5 hours to run from scratch.
  26. There are four main contributions to the run time:
  27. 1.) Decomposing data: 415s. Linear in the number of input events
  28. 2.) Determining hyperparameters: 281s. Linear in the size of the initial
  29. search grid.
  30. 2.) Decomposing signal models: 3380s. Linear in the number of signal models
  31. and the number of events used to simulate each signal.
  32. 3.) Calculating limits and p-values: ~1200s. Linear in the number of signal
  33. models.
  34. Check the config files for each example (located in <example>/base.conf). The
  35. comments describe the various parameters and their function. You probably want
  36. to adjust 'Nthread' to match your number of CPU cores before running the
  37. examples. The remaining parameters should require no adjustment (but feel free
  38. to play around).
  39. Each of the examples can be run as follows:
  40. 1. Enter the FD directory and set up the code:
  41. cd <FD_DIR>
  42. . bin/setup_bash.sh
  43. This will also check if the required Python packages are available, and warn
  44. you if they are not.
  45. 2. Enter the example directory and import / generate the test data:
  46. cd Examples/<Example_Name>
  47. fd_generate.py --setname Test --varname Myy --wgtname weight --size 12000000
  48. fd_import.py ../InputSignalData/*
  49. 'fd_generate.py' produces a random sample of background-like data. Change the
  50. size parameter if you want more/fewer events. 'fd_import.py' imports several
  51. datasets from the CSV files in 'InputSignalData'. These contain Gaussian
  52. signal shapes that can be injected into the background-like sample to
  53. simulate resonances.
  54. 3. Run the scan
  55. fd_scan.py
  56. In a real application, you would use 'fd_import.py' to read in your data (either
  57. as .csv or .root). Note that for the 'bg_only' example, 'fd_import.py' is
  58. unnecessary and can be skipped.
  59. That's it! The output will be located in 'Output/*'. Each plot is saved as an
  60. individual pdf file, and additionally all plots are saved together as a
  61. multipage pdf in 'Output/all.pdf'.
  62. One last thing to note: all decompositions and likelihood calculations are
  63. cached on disk in '<example>/Cache/*'. Any repeated computations will hit the
  64. disk cache, which vastly speeds things up when making small changes (e.g. plot
  65. tweaks or including additional signal models).
  66. The cache files are named using all relevant parameters along with a checksum of
  67. the dataset. This ensures that if you change parameters or cuts, the cache will
  68. not accidentally use values computed using a different configuration. If you'd
  69. like to force FD to completely re-run from scratch, just delete the Cache
  70. directory. It will be automatically re-created and re-populated the next time
  71. 'fd_scan.py' is run.