Blaz Leban 9293808339 add first exercise to the repo 3 年 前
..
Examples 9293808339 add first exercise to the repo 3 年 前
Plots 9293808339 add first exercise to the repo 3 年 前
Tools 9293808339 add first exercise to the repo 3 年 前
bin 9293808339 add first exercise to the repo 3 年 前
data 9293808339 add first exercise to the repo 3 年 前
.DS_Store 9293808339 add first exercise to the repo 3 年 前
README 9293808339 add first exercise to the repo 3 年 前

README

This package is a self-contained implementation of functional decomposition, an
unbinned, parametric solution for fitting mass spectra, conducting searches and
producing limits. FD decomposes a dataset into a complete set of orthogonal
functions (the orthonormal exponentials), whose coefficients can be extracted
from the data by direct computation. It uses a penalized likelihood to
determine the appropriate number of terms to retain from the infinite series.

-------------------------
RUNNING THE EXAMPLE CASES
-------------------------
There are three examples provided, with increasing levels of complexity. These
are:

bkg_only Decompose a single smooth spectrum using the orthonormal
exponentials.
sig_bkg Decompose a smooth spectrum with two known resonances. Use
the orthonormal exponentials as a background model and two
Gaussians to model the two known peaks.
sig_bkg_scan Decompose a smooth spectrum with two known resonances, and
perform a search for a new resonances. Use the orthonormal
exponentials for the background, two Gaussians to model the
two known peaks, and scan a third peak through several masses
and widths.

The first two examples should run in only a few minutes. The third example
tests some ~600 different signal hypotheses, and can take rather longer. On
my Dell XPS13 9350, using a fairly large sample of 5e7 events, this example
takes about 1.5 hours to run from scratch.

There are four main contributions to the run time:
1.) Decomposing data: 415s. Linear in the number of input events
2.) Determining hyperparameters: 281s. Linear in the size of the initial
search grid.
2.) Decomposing signal models: 3380s. Linear in the number of signal models
and the number of events used to simulate each signal.
3.) Calculating limits and p-values: ~1200s. Linear in the number of signal
models.

Check the config files for each example (located in /base.conf). The
comments describe the various parameters and their function. You probably want
to adjust 'Nthread' to match your number of CPU cores before running the
examples. The remaining parameters should require no adjustment (but feel free
to play around).

Each of the examples can be run as follows:

1. Enter the FD directory and set up the code:

cd
. bin/setup_bash.sh

This will also check if the required Python packages are available, and warn
you if they are not.

2. Enter the example directory and import / generate the test data:

cd Examples/
fd_generate.py --setname Test --varname Myy --wgtname weight --size 12000000
fd_import.py ../InputSignalData/*

'fd_generate.py' produces a random sample of background-like data. Change the
size parameter if you want more/fewer events. 'fd_import.py' imports several
datasets from the CSV files in 'InputSignalData'. These contain Gaussian
signal shapes that can be injected into the background-like sample to
simulate resonances.

3. Run the scan

fd_scan.py

In a real application, you would use 'fd_import.py' to read in your data (either
as .csv or .root). Note that for the 'bg_only' example, 'fd_import.py' is
unnecessary and can be skipped.

That's it! The output will be located in 'Output/*'. Each plot is saved as an
individual pdf file, and additionally all plots are saved together as a
multipage pdf in 'Output/all.pdf'.

One last thing to note: all decompositions and likelihood calculations are
cached on disk in '/Cache/*'. Any repeated computations will hit the
disk cache, which vastly speeds things up when making small changes (e.g. plot
tweaks or including additional signal models).

The cache files are named using all relevant parameters along with a checksum of
the dataset. This ensures that if you change parameters or cuts, the cache will
not accidentally use values computed using a different configuration. If you'd
like to force FD to completely re-run from scratch, just delete the Cache
directory. It will be automatically re-created and re-populated the next time
'fd_scan.py' is run.