copar/psuf: Repozitorij kode za vaje pri predmetu Praktikum strojnega učenja v fiziki.

Blaz Leban 9293808339 add first exercise to the repo		4 年前
..
Examples	9293808339 add first exercise to the repo	4 年前
Plots	9293808339 add first exercise to the repo	4 年前
Tools	9293808339 add first exercise to the repo	4 年前
bin	9293808339 add first exercise to the repo	4 年前
data	9293808339 add first exercise to the repo	4 年前
.DS_Store	9293808339 add first exercise to the repo	4 年前
README	9293808339 add first exercise to the repo	4 年前

		
			
			
				README
			
		
		
	
			
				This package is a self-contained implementation of functional decomposition, an
unbinned, parametric solution for fitting mass spectra, conducting searches and
producing limits. FD decomposes a dataset into a complete set of orthogonal
functions (the orthonormal exponentials), whose coefficients can be extracted
from the data by direct computation.  It uses a penalized likelihood to
determine the appropriate number of terms to retain from the infinite series.

-------------------------
RUNNING THE EXAMPLE CASES
-------------------------
There are three examples provided, with increasing levels of complexity. These
are:

   bkg_only      Decompose a single smooth spectrum using the orthonormal
                 exponentials.
   sig_bkg       Decompose a smooth spectrum with two known resonances.  Use
                 the orthonormal exponentials as a background model and two
                 Gaussians to model the two known peaks.
   sig_bkg_scan  Decompose a smooth spectrum with two known resonances, and
                 perform a search for a new resonances. Use the orthonormal
                 exponentials for the background, two Gaussians to model the
                 two known peaks, and scan a third peak through several masses
                 and widths.

The first two examples should run in only a few minutes.  The third example
tests some ~600 different signal hypotheses, and can take rather longer. On
my Dell XPS13 9350, using a fairly large sample of 5e7 events, this example
takes about 1.5 hours to run from scratch.

There are four main contributions to the run time:
  1.) Decomposing data: 415s. Linear in the number of input events
  2.) Determining hyperparameters: 281s. Linear in the size of the initial
      search grid.
  2.) Decomposing signal models: 3380s.  Linear in the number of signal models
      and the number of events used to simulate each signal.
  3.) Calculating limits and p-values: ~1200s.  Linear in the number of signal
      models.

Check the config files for each example (located in /base.conf). The
comments describe the various parameters and their function.  You probably want
to adjust 'Nthread' to match your number of CPU cores before running the
examples.  The remaining parameters should require no adjustment (but feel free
to play around).

Each of the examples can be run as follows:

1. Enter the FD directory and set up the code:

       cd 
       . bin/setup_bash.sh

   This will also check if the required Python packages are available, and warn
   you if they are not.

2. Enter the example directory and import / generate the test data:

       cd Examples/
       fd_generate.py --setname Test --varname Myy --wgtname weight --size 12000000
       fd_import.py ../InputSignalData/*

   'fd_generate.py' produces a random sample of background-like data. Change the
   size parameter if you want more/fewer events.  'fd_import.py' imports several
   datasets from the CSV files in 'InputSignalData'.  These contain Gaussian
   signal shapes that can be injected into the background-like sample to
   simulate resonances.

3. Run the scan

       fd_scan.py

In a real application, you would use 'fd_import.py' to read in your data (either
as .csv or .root).  Note that for the 'bg_only' example, 'fd_import.py' is
unnecessary and can be skipped.

That's it! The output will be located in 'Output/*'.  Each plot is saved as an
individual pdf file, and additionally all plots are saved together as a
multipage pdf in 'Output/all.pdf'.

One last thing to note:  all decompositions and likelihood calculations are
cached on disk in '/Cache/*'.  Any repeated computations will hit the
disk cache, which vastly speeds things up when making small changes (e.g. plot
tweaks or including additional signal models).

The cache files are named using all relevant parameters along with a checksum of
the dataset.  This ensures that if you change parameters or cuts, the cache will
not accidentally use values computed using a different configuration.  If you'd
like to force FD to completely re-run from scratch, just delete the Cache
directory. It will be automatically re-created and re-populated the next time
'fd_scan.py' is run.