copar
/
psuf


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
							This package is a self-contained implementation of functional decomposition, an
unbinned, parametric solution for fitting mass spectra, conducting searches and
producing limits. FD decomposes a dataset into a complete set of orthogonal
functions (the orthonormal exponentials), whose coefficients can be extracted
from the data by direct computation.  It uses a penalized likelihood to
determine the appropriate number of terms to retain from the infinite series.

-------------------------
RUNNING THE EXAMPLE CASES
-------------------------
There are three examples provided, with increasing levels of complexity. These
are:

   bkg_only      Decompose a single smooth spectrum using the orthonormal
                 exponentials.
   sig_bkg       Decompose a smooth spectrum with two known resonances.  Use
                 the orthonormal exponentials as a background model and two
                 Gaussians to model the two known peaks.
   sig_bkg_scan  Decompose a smooth spectrum with two known resonances, and
                 perform a search for a new resonances. Use the orthonormal
                 exponentials for the background, two Gaussians to model the
                 two known peaks, and scan a third peak through several masses
                 and widths.

The first two examples should run in only a few minutes.  The third example
tests some ~600 different signal hypotheses, and can take rather longer. On
my Dell XPS13 9350, using a fairly large sample of 5e7 events, this example
takes about 1.5 hours to run from scratch.

There are four main contributions to the run time:
  1.) Decomposing data: 415s. Linear in the number of input events
  2.) Determining hyperparameters: 281s. Linear in the size of the initial
      search grid.
  2.) Decomposing signal models: 3380s.  Linear in the number of signal models
      and the number of events used to simulate each signal.
  3.) Calculating limits and p-values: ~1200s.  Linear in the number of signal
      models.

Check the config files for each example (located in <example>/base.conf). The
comments describe the various parameters and their function.  You probably want
to adjust 'Nthread' to match your number of CPU cores before running the
examples.  The remaining parameters should require no adjustment (but feel free
to play around).

Each of the examples can be run as follows:

1. Enter the FD directory and set up the code:

       cd <FD_DIR>
       . bin/setup_bash.sh

   This will also check if the required Python packages are available, and warn
   you if they are not.

2. Enter the example directory and import / generate the test data:

       cd Examples/<Example_Name>
       fd_generate.py --setname Test --varname Myy --wgtname weight --size 12000000
       fd_import.py ../InputSignalData/*

   'fd_generate.py' produces a random sample of background-like data. Change the
   size parameter if you want more/fewer events.  'fd_import.py' imports several
   datasets from the CSV files in 'InputSignalData'.  These contain Gaussian
   signal shapes that can be injected into the background-like sample to
   simulate resonances.

3. Run the scan

       fd_scan.py

In a real application, you would use 'fd_import.py' to read in your data (either
as .csv or .root).  Note that for the 'bg_only' example, 'fd_import.py' is
unnecessary and can be skipped.

That's it! The output will be located in 'Output/*'.  Each plot is saved as an
individual pdf file, and additionally all plots are saved together as a
multipage pdf in 'Output/all.pdf'.

One last thing to note:  all decompositions and likelihood calculations are
cached on disk in '<example>/Cache/*'.  Any repeated computations will hit the
disk cache, which vastly speeds things up when making small changes (e.g. plot
tweaks or including additional signal models).

The cache files are named using all relevant parameters along with a checksum of
the dataset.  This ensures that if you change parameters or cuts, the cache will
not accidentally use values computed using a different configuration.  If you'd
like to force FD to completely re-run from scratch, just delete the Cache
directory. It will be automatically re-created and re-populated the next time
'fd_scan.py' is run.