This folder contains part of the code used for the comparison analysis published in "The Machine Learning Landscape of Top Taggers". Abstract of the paper:
"Based on the established task of identifying boosted, hadronically decaying top quarks, we compare a wide range of modern machine learning approaches. We find that they are extremely powerful and great fun."
The files with the output probabilities of each tagger were provided by the individual groups, as described in the paper. These files are not part of this repository.
Make sure the following tools are installed and running:
- Packages included in Anaconda (Jupyter Notebook, Numpy, Scipy, scikit-learn, Pandas)
A description and link to the Top Tagging Reference Dataset (provided by Gregor Kasieczka, Michael Russel and Tilman Plehn) can be found here with the link to download it here. This dataset contains 1.2M training events, 400k validation events, 400k test events with equal numbers of top quark and qcd jets. Only 4-momentum vectors of the jet constituents.
-jet_study_allTaggers.ipynb
: Jupyter Notebook that performs the comparison analysis.
-jets_kinematics
: Kinematic variables (E, eta, phi, pz) values obtained from reclustering the jet constituents of the test set of the Top Tagging Reference Dataset.
-ROC_allTaggers.pdf
: ROC curve of each algorithm for the model that gives the median AUC out of all the 9 models.
-plot_bar_flatten.pdf
: Bar plot showing the background rejection at 30% tag efficiency for the model with the median AUC for each algorithm. This plot shows values for the original dataset and after reweighting the output probabilities so as to flatten the pT, eta, pz and E distributions.
-labels.pkl
: Labels of the test set with the truth values for each jet.
Please cite this code as
@misc{TopTagComparison,
author = "K. Cranmer, G. Kasieczka, S. Macaluso",
title = "{Subset of the code used for the Top Tagging Comparison Analysis}",
note = "{DOI: }",
year = {2019},
url = {https://github.com/SebastianMacaluso/TopTagComparison}
}