cta-observatory · rlopezcoto · Jan 30, 2024 · Jan 24, 2024 · Jan 24, 2024 · Jan 25, 2024
diff --git a/docs/lst_analysis_workflow.rst b/docs/lst_analysis_workflow.rst
@@ -51,6 +51,17 @@ Files and configuration
 The DL1 files to use obviously depend on the source you want to analyse.
 Unless you have a good reason, the latest version of the DL1 files should be used.
 
+Selection of the real-data DL1 files
+------------------------------------
+
+The selection of the DL1 files (run-wise, i.e. those produced by lstosa by merging the subrun-wise DL1 files) for a
+specific analysis (i.e., a given source and time period) can be performed using the notebook
+``cta_lstchain/notebooks/data_quality.ipynb``. The selection also takes into account the quality of the data, mostly in
+terms of atmospheric conditions - evaluated using the rate of cosmic-ray showers as a proxy. Data taken under poor
+conditions will not be included in the list of selected runs. Instructions and further details can be found inside the
+notebook.
+
+
 RF models
 ---------
 
@@ -62,26 +73,58 @@ The RF models are stored in the following directory:
 ``/fefs/aswg/data/models/...``
 
 
-Tuning of DL1 files and RF models
----------------------------------
+Tuning of MC DL1 files and RF models
+------------------------------------
 
-In case of high NSB in the data, it is possible to tune the DL1 files and the RF models to improve the performance of the analysis.      
-This is done by changing the `config` file of the RF models and producing new DL1 files and training new RF models.
-To produce a config tuned to the data you want to analyse, you may run ``lstchain_tune_nsb`` function that will produce a ``tuned_nsb_config.json`` file.
+The default MC production is generated with a level of noise in the images which corresponds to the level of diffuse
+night-sky background ("NSB") in a "dark" field of view (i.e. for observations with moon below the horizon, at not-too-low
+galactic latitudes and not affected by other sources of noise, like the zodiacal light). In general, observations of
+**extragalactic** sources in dark conditions can be properly analyzed with the default MC (i.e. with the standard RF models).
 
-To request a new production of RF models, you can open a pull-request on the lstmcpipe repository, producing a complete MC config, using:
+The median of the standard deviation of the pixel charges recorded in interleaved pedestal events (in which  a camera
+image is recorded in absence of a physics trigger) is a good measure of the NSB level in a given data run. This is computed
+by the data selection notebook ``cta_lstchain/notebooks/data_quality.ipynb`` (see above). For data with an NSB level
+significantly higher than the "dark field" one, it is possible to tune (increase) the noise in the MC files, and produce
+from them RF models (and "test MC" for computing instrument response functions) which improve the performance of the
+analysis (relative to using the default, low-NSB MC).
+
+This is done by changing the configuration file for the MC processing, producing new DL1(b) files, and training new RF models.
+To produce a config tuned to the data you want to analyse, you first have to obtain the standard analysis configuration
+file (for MC) for the desired version of lstchain (= the version with which the real DL1 files you will use were produced).
+This can be done with the script :py:obj:`~lstchain.scripts.lstchain_dump_config`:
 
 .. code-block::
 
-    lstchain-dump-config --mc --update-with tuned_nsb_config.json --output-file PATH_TO_OUTPUT_FILE
+    lstchain_dump_config --mc --output-file standard_lstchain_config.json
+
+Now you have to update the file with the parameters needed to increase the NSB level. For this you need a simtel.gz MC
+file from the desired production (any will do, it can be either a gamma or a proton file), and a "typical" subrun DL1 file
+from your **real data**  sample. "Typical" means one in which the NSB level is close to the median value for the sample
+to be analyzed. The data selection notebook ``cta_lstchain/notebooks/data_quality.ipynb`` (see above) provides a list of
+a few such subruns for your selected sample - you can use any of them. To update the config file you have to use the
+script :py:obj:`~lstchain.scripts.lstchain_tune_nsb` , e.g. :
+
+.. code-block::
+
+    lstchain_tune_nsb.py --config standard_lstchain_config.json \
+                         --input-mc .../simtel_corsika_theta_6.000_az_180.000_run10.simtel.gz \
+                         --input-data .../dl1_LST-1.Run10032.0069.h5 \
+                         -o tuned_nsb_lstchain_config.json
+
+To request a **new production of RF models**, you can open a pull-request on the lstmcpipe repository, providing
+the .json configuration file produced following the steps above.
+
+
+Keeping track of lstchain configurations
+----------------------------------------
 
+The lstchain configuration file used to process the  simulations and produce the RF models of a given MC production is
+provided in the lstmcpipe repository, as well as in the models directory.
 
-lstchain config
----------------
+It is important that the software version, and the configuration used for processing real data and MC are the same. For a
+given lstchain version, this should be guaranteed by following the procedure above which makes use of
+:py:obj:`~lstchain.scripts.lstchain_dump_config`.
 
-The lstchain config used to produce the RF models of a production is provided in the lstmcpipe repository, as well as in the models directory.
-It is a good idea to use the same config for the data analysis.
-You can also produce a config using `lstchain-dump-config`.
 
 DL3/IRF config files
 --------------------

diff --git a/docs/lstchain_api/datachecks/index.rst b/docs/lstchain_api/datachecks/index.rst
@@ -9,7 +9,93 @@ Datachecks (`datachecks`)
 Introduction
 ============
 
-Module containing functions producing the LST datachecks. Currently reaching DL1 level.
+Module containing functions for checking the quality of the LST data.
+
+DL1 data checks
+===============
+
+Currently the checks are done at the DL1 level.
+The DL1 datacheck files are produced by running the following scripts sequentially:
+
+* :py:obj:`lstchain.scripts.lstchain_check_dl1`
+
+  This takes as input a DL1 file (including DL1a information, i.e. camera images & times) from a data subrun, e.g.:
+
+  .. code-block:: bash
+
+    lstchain_check_dl1 --input-file dl1_LST-1.1.Run14619.0000.h5 --output-dir OUTPUT_DIR --omit-pdf
+
+
+  The script produces a data check file for the subrun, ``datacheck_dl1_LST-1.Run14619.0000.h5`` which contains many
+  quantities that can be used to judge the quality of the data (see class :py:obj:`~lstchain.datachecks.containers.DL1DataCheckContainer`)
+
+|
+
+* :py:obj:`lstchain.scripts.lstchain_check_dl1`
+
+  The same script is run again, but now providing as input the **subrun-wise datacheck files** produced earlier (all subruns
+  of a given run must be provided). It also needs to know where the subrun-wise ``muons_LST-1.*.fits files`` (produced in
+  the R0 to DL1 analysis step) which contain the muon ring information are stored ("MUONS_DIR"):
+
+  .. code-block:: bash
+
+    lstchain_check_dl1 --input-file "datacheck_dl1_LST-1.Run14619.*.h5" --output-dir OUTPUT_DIR --muons-dir MUONS_DIR
+
+
+  The output is now a data check file for the whole run, ``datacheck_dl1_LST-1.Run14619.h5`` which contains all the
+  information from the subrun-wise files. The script also produces a .pdf file ``datacheck_dl1_LST-1.Run14619.pdf`` with
+  various plots of the quantities stored in the DL1DataCheckContainer objects, plus others obtained from the muon ring
+  analysis. Note that the muon ring information is not propagated to the run-wise datacheck files, it is just used for
+  the plotting.
+
+|
+
+* :py:obj:`lstchain.scripts.lstchain_longterm_dl1_check`
+
+  This script merges the run-wise datacheck files of (typically) one night, stored in INPUT_DIR, and produces **a
+  single .h5 file for the whole night** as output (e.g. ``DL1_datacheck_20230920.h5``). The "longterm" in the script
+  name is a bit of an overstatement - in principle it can be run over data from many nights, but the output html (see
+  below) becomes too heavy and some of the interactive features work poorly.
+
+  .. code-block:: bash
+
+    lstchain_longterm_dl1_check --input-dir INPUT_DIR --muons-dir MUONS_DIR --output-file DL1_datacheck_20230920.h5 --batch
+
+  The output .h5 file contains a (run-wise) summarized version of the information in the input files, including the muon
+  ring .fits files. It also creates an .html file (e.g. ``DL1_datacheck_20230920.html``) which can be opened with any
+  web browser and which contains various interactive plots which allow to make a quick check of the data of a night. See
+  an example of the .html file `here <https://www.lst1.iac.es/datacheck/dl1/v0.10/20230920/DL1_datacheck_20230920.html>`_
+  (password protected).
+
+|
+
+.. Heading below be replaced by this once script is merged!  :py:obj:`lstchain.scripts.lstchain_cherenkov_transparency`
+
+* ``lstchain_cherenkov_transparency``
+
+
+  This script analyzes the image intensity histograms (one per subrun) stored in the **run-wise** datacheck files (which
+  must exist in INPUT_DIR)
+
+  .. code-block:: bash
+
+    lstchain_cherenkov_transparency --update-datacheck-file DL1_datacheck_20230920.h5 --input-dir INPUT_DIR
+
+  The script updates the **night-wise** datacheck .h5 file ``DL1_datacheck_20230920.h5`` with a new table (with one entry
+  per subrun) containing parameters related to the image intensity spectra for cosmic ray events (i.e., a
+  Cherenkov-transparency - like approach, see e.g. https://arxiv.org/abs/1310.1639).
+
+
+
+
+Using the datacheck files for selecting good-quality data
+=========================================================
+
+The night-wise datacheck .h5 files, ``DL1_datacheck_YYYYMMDD.h5`` can be used to select a subsample of good quality data
+from a large sample of observations. The files are relatively light, 6 MB per night in average. A large sample of them
+can be processed with the notebook ``cta_lstchain/notebooks/data_quality.ipynb`` (instructions can be found inside the
+notebook)
+
 
 Reference/API
 =============

diff --git a/lstchain/image/modifier.py b/lstchain/image/modifier.py
@@ -194,6 +194,8 @@
                     '/dl1/event/telescope/monitoring/calibration')
     data_dl1_pedestal =  read_table(data_dl1_filename,
                     '/dl1/event/telescope/monitoring/pedestal')
+    data_dl1_flatfield =  read_table(data_dl1_filename,
+                    '/dl1/event/telescope/monitoring/flatfield')
     data_dl1_parameters =  read_table(data_dl1_filename,
                     '/dl1/event/telescope/parameters/LST_LSTCam')
     data_dl1_image = read_table(data_dl1_filename,
@@ -203,8 +205,9 @@
     # Locate pixels with HG declared unusable either in original calibration or
     # in interleaved events:
     bad_pixels = unusable[0][0]  # original calibration
-    for tf in unusable[1:][0]:   # calibrations with interleaveds
-        bad_pixels = np.logical_or(bad_pixels, tf)
+    if unusable.shape[0] > 1:
+        for tf in unusable[1:][0]:   # calibrations with interleaveds
+            bad_pixels = np.logical_or(bad_pixels, tf)
     good_pixels = ~bad_pixels
 
     # First index:  1,2,... = values from interleaveds (0 is for original
@@ -214,6 +217,29 @@
 
     # HG adc to pe conversion factors from interleaved calibrations:
     data_HG_dc_to_pe = data_dl1_calibration['dc_to_pe'][:, 0, :]
+
+    if data_dl1_flatfield['charge_mean'].shape[0] < 2:
+        logging.error('\nCould not find interleaved FF calibrations in '
+                      'monitoring table!')
+        return None, None, None
+
+    if data_dl1_pedestal['charge_std'].shape[0] < 2 :
+        logging.error('\nCould not find interleaved pedestal info in '
+                      'monitoring table!')
+        return None, None, None
+
+    # Mean HG charge in interleaved FF events, to spot possible issues:
+    data_HG_FF_mean = data_dl1_flatfield['charge_mean'][1:, 0, :]
+    dummy = []
+    # indices which connect each FF calculation to a given calibration:
+    calibration_id = data_dl1_flatfield['calibration_id'][1:]
+
+    for i, x in enumerate(data_HG_FF_mean[:, ]):
+        dummy.append(x * data_HG_dc_to_pe[calibration_id[i],])
+    dummy = np.array(dummy)
+    # Average for all interleaved calibrations (in case there are more than one)
+    data_HG_FF_mean_pe = np.mean(dummy, axis=0) # one value per pixel
+
     # Pixel-wise pedestal standard deviation (for an unbiased extractor),
     # in adc counts:
     data_HG_ped_std = data_dl1_pedestal['charge_std'][1:, 0, :]
@@ -224,20 +250,24 @@
     for i, x in enumerate(data_HG_ped_std[:, ]):
         dummy.append(x * data_HG_dc_to_pe[calibration_id[i],])
     dummy = np.array(dummy)
-
     # Average for all interleaved calibrations (in case there are more than one)
     data_HG_ped_std_pe = np.mean(dummy, axis=0) # one value per pixel
 
     # Identify noisy pixels, likely containing stars - we want to adjust MC to
     # the average diffuse NSB across the camera
-    data_median_std_ped_pe = np.nanmedian(data_HG_ped_std_pe)
-    data_std_std_ped_pe = np.nanstd(data_HG_ped_std_pe)
-    log.info(f'Real data: median across camera of good pixels\' pedestal std '
+    data_median_std_ped_pe = np.nanmedian(data_HG_ped_std_pe[good_pixels])
+    data_std_std_ped_pe = np.nanstd(data_HG_ped_std_pe[good_pixels])
+    log.info('\nReal data:')
+    log.info(f'   Number of bad pixels (from calibration): {bad_pixels.sum()}')
+    log.info(f'   Median of FF pixel charge: '
+             f'{np.nanmedian(data_HG_FF_mean_pe[good_pixels]):.3f} p.e.')
+    log.info(f'   Median across camera of good pixels\' pedestal std '
              f'{data_median_std_ped_pe:.3f} p.e.')
     brightness_limit = data_median_std_ped_pe + 3 * data_std_std_ped_pe
     too_bright_pixels = (data_HG_ped_std_pe > brightness_limit)
-    log.info(f'Number of pixels beyond 3 std dev of median: '
-             f'{too_bright_pixels.sum()}, (above {brightness_limit:.2f} p.e.)')
+    log.info(f'   Number of pixels beyond 3 std dev of median: '
+             f'   {too_bright_pixels.sum()}, (above {brightness_limit:.2f} '
+             f'p.e.)')
 
     ped_mask = data_dl1_parameters['event_type'] == 2
     # The charges in the images below are obtained with the extractor for

diff --git a/lstchain/scripts/lstchain_tune_nsb.py b/lstchain/scripts/lstchain_tune_nsb.py
@@ -79,6 +79,9 @@
 
     a, b, c = calculate_noise_parameters(args.input_mc, args.input_data,
                                          args.config)
+    if a is None:
+        logging.error('Could not compute NSB tuning parameters. Exiting!')
+        sys.exit(1)
 
     dict_nsb = {"increase_nsb": True,
                 "extra_noise_in_dim_pixels": round(a, 3),
@@ -93,7 +96,11 @@
 
     if args.output_file:
         cfg = read_configuration_file(args.config)
-        cfg['image_modifier'].update(dict_nsb)
+        if 'image_modifier' in cfg:
+            cfg['image_modifier'].update(dict_nsb)
+        else:
+            cfg['image_modifier'] = dict_nsb
+
         dump_config(cfg, args.output_file, overwrite=args.overwrite)