Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random updates and refactoring of the combined analysis pipeline #91

Merged
merged 92 commits into from
Sep 29, 2022

Conversation

YoshikiOhtani
Copy link
Collaborator

@YoshikiOhtani YoshikiOhtani commented Aug 19, 2022

This pull request includes some random updates and refactoring of the combined analysis pipeline. I tested running the updated pipeline with a few data samples and it worked. However, I will keep this pull request open until I confirm that the pipeline produces the consistent results with the whole Crab samples that I usually analyze. Meanwhile, it would be very much appreciated if you could give me some comments or feedbacks. Let me assign @aleberti and @jsitarek as reviewers.

Here let me summarize the changes as follows:

General changes

  • Refactored the pipeline by following suggestions of flake8, isort and black with the max-line-length = 88 setting (72 for docstrings as following PEP8). Constant variables are now defined in capital letters. Please note that the most updated version of black consider the magic trailing comma by default, but in this pull request I removed this functionality with the option -C (or --skip-magic-trailing-comma) to reduce the user flexibility as much as possible.
  • Created magicctapipe/io directory and the modules container.py, gadf.py and io.py, and moved some functions to them from magicctapipe/scripts/lst1_magic/*.py and magicctapipe/utils/functions.py.
  • Updated the documentation of the scripts and added explanations to the code as much as possible so that users can understand what is done in the pipeline.
  • Updated the logger information display so that while running the scripts users can easily cross-check if the settings are correct or not.
  • Naming of the output files is simplified, i.e., just replacing specific characters assuming that the input file names do not change from the default ones.

Important changes for users

  • Changed the format of some settings in the configuration file to string in case they have units, so that astropy Quantity can read the unit directly from the string. See the following example:
  • Changed the argument to train RFs from --train-direction to --train-disp, and also the name of the output RF file from direction_regressors_*.joblib to disp_regressors_*.joblib.
  • Changed the parameter name dist_sum to disp_diff_sum, which is the sum of the angular distances of DISP candidates.
  • Changed the configuration name from software_with_any2 to software, and software to software_only_3tel, since the analysis with any2 events would be more standard for the software coincidence.
  • (EDIT) Added the parameter disp_diff_mean which is the mean of the angular distances of DISP candidates.
  • (EDIT) Changed the argument of lst1_magic_train_rfs.py from --input-file-gamma, --input-file-bkg to --input-dir-gamma, --input-dir-proton so that multiple input MC files produced by different pointing directions can be read.

From here let me describe the other noticeable changes for each script or module:

functions.py

  • Renamed calculate_angular_distance to calculate_off_coordinates and simplify the functionality to return only OFF coordinates. Please note that the needed arguments are also changed.
  • Renamed check_tel_combination to get_stereo_events, and changed the functionality to apply quality cuts and return a new data frame of stereo events with the combo_type and multiplicity parameters.
  • Renamed save_pandas_to_table to save_pandas_data_in_table.
  • Moved get_stereo_events, get_dl2_mean and save_pandas_data_in_table to magicctapipe/io/io.py.
  • Moved create_gh_cuts_hdu to magicctapipe/io/gadf.py.

lst1_magic_mc_dl0_to_dl1.py

  • Moved the custom containers to magicctapipe/io/containers.py and adapted the script to import them from the module.
  • Pass only the pixels surviving the cleaning to the parametrization, which speeds up the process around 10%.
  • (EDIT) Skip the process of the event whose cleaned image contains negative charges, since it fails the Hillas and timing parametrization.
  • Skip the process of the event which failed to reconstruct finite timing parameters.

magic_calib_to_dl1.py

  • The same updates as lst1_magic_mc_dl0_to_dl1.py.

lst1_magic_event_coincidence.py

  • Renamed load_lst_data_file and load_magic_data_file to load_lst_dl1_data_file and load_magic_dl1_data_files, moved them to magicctapipe/io/io.py, and adapted the script to import them from the module.
  • Changed the name of the setting from type_lst_time to timestamp_type_lst, which is the type of the LST timestamp.
  • Changed the name of the setting from window_width (= 600 ns) to window_half_width (= 300 ns), which is the coincidence window half width.
  • Changed the philosophy of the coincidence algorithm to scale all the timing-related parameters (time_lst, time_magic, time_offsets, window_half_width) in nanoseconds and use the int64bit format, instead of subtracting the UNIX time of an observation day and using float64bit type. I think this is a safer way in terms of a rounding issue, and now we don't need to round the parameters which happened everywhere in the script. Also I believe it improves the readability of the code.

lst1_magic_stereo_reco.py

  • Use loc method instead of query to extract a shower event, since it speeds up the process around 20%.
  • Skip the process of the event if it fails to reconstruct valid stereo parameters, which happens when the event contains images of width = 0 (though they should be removed by the default quality cuts).
  • Don't save the stereo parameters which do not have finite values even if the reconstruction succeeds, i.e., core_uncert, h_max_uncert, is_valid. See the definition of the reconstructor.

lst1_magic_train_rfs.py

  • Moved load_train_data_file to magicctapipe/io/io.py and adapted the script to import it from the module.
  • Removed check_feature_importance and added the functionality to the training functions.
  • Removed get_events_at_random and added the functionality directly inside train_event_classifier.
  • Renamed train_direction_regressor to train_disp_regressor as following the change of the regressor module from DirectionRegressor to DispRegressor.
  • (EDIT) Renamed load_train_data_file to load_train_data_files and the argument from input_file to input_dir so that multiple MC files can be read.

lst1_magic_dl1_stereo_to_dl2.py

  • Separated the functionality of DirectionRegressor.predict to DispRegressor.predict and reconstruct_arrival_direction functions to make it clear that DispRegressor only trains and predicts the DISP parameter, and the reconstruction of the arrival direction is done separately with the DISP method. Also, DirectionRegressor assumed that the input data is already separated per telescope combination type, but now DispRegressor does not assume it. So we can easily test training/applying RFs with the combo_type parameter without separating the RFs per telescope combination types.
  • Don't transform telescope-wise Alt/Az direction to the RA/Dec coordinate, since for creating the IRFs/DL3 the RA/Dec direction transformed from the mean Alt/Az direction is used.

lst1_magic_create_irf.py

  • Renamed load_dl2_data_file to load_mc_dl2_data_file, moved it to magicctapipe/io/io.py, and adapted the script to import it from the module.
  • Removed apply_dynamic_gammaness_cut and apply_dynamic_theta_cut, and added the functionality to create_irf.

lst1_magic_dl2_to_dl3.py

  • Removed calculate_deadc and added the functionality inside load_dl2_data_file.
  • Moved load_dl2_data_file, load_irf_files to magicctapipe/io/io.py, and adapted the script to import them from the module
  • Renamed create_event_list, create_gti_table and create_pointing_table to create_event_hdu, create_gti_hdu and create_pointing_hdu, changed the functionality to return the HDUs, moved them to magicctapipe/io/gadf.py, and adapted the script to import them from the module.
  • Calculate the mean pointing Az direction in a range where the standard deviation of the pointing direction is smaller, 0 < az < 360 deg or -180 < az < 180 deg, which was originally suggested by @jsitarek in the pull request Implement IRF interpolation method #81.

@YoshikiOhtani YoshikiOhtani changed the title Refactoring and style unification of the combined analysis code Random updates and refactoring of the combined analysis pipeline Aug 24, 2022
@YoshikiOhtani YoshikiOhtani marked this pull request as ready for review September 5, 2022 12:30
@YoshikiOhtani YoshikiOhtani linked an issue Sep 5, 2022 that may be closed by this pull request
@YoshikiOhtani
Copy link
Collaborator Author

Thank you very much @jsitarek for your checks and comments. We already talked via emails, but just in case let me write also here. I implemented all his comments in this pull request and resolved them. I would like to push a few more commits about refactoring with which I came up during implementing his comments, and also about the implementation of a functionality to create the full-enclosure IRFs. I will let you know once I'm done, and then it will be finally ready to be merged to the master branch. @aleberti if you have any comments or suggestions to this branch please let me know.

@jsitarek
Copy link
Collaborator

jsitarek commented Sep 28, 2022 via email

@YoshikiOhtani
Copy link
Collaborator Author

Hi @jsitarek, OK thanks, actually I was also thinking about that, so I will create another pull request later.

@YoshikiOhtani
Copy link
Collaborator Author

OK so I have done. There are really minor changes so I don't think they need to be reviewed thoroughly. If there is no objection, let me merge it to the master branch.

@aleberti
Copy link
Collaborator

Hi @YoshikiOhtani, yes, please go ahead and merge.

@YoshikiOhtani
Copy link
Collaborator Author

OK, I will merge it now.

@YoshikiOhtani YoshikiOhtani merged commit 050e1b2 into master Sep 29, 2022
@YoshikiOhtani YoshikiOhtani deleted the style-unif branch September 29, 2022 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The scaled DISP difference would be more useful? Move functions to proper modules
3 participants