-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding tools for jet clustering for cluster and truth particles #80
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jennifer, great thanks! Here are already a few comments/suggestions/questions.
Now I see why you proposed to have a general algorithm that does both caloJet and TruthJets. Maybe one way could be to have a separated helper class that does only the fastJet part. We would still have two Gaudi components (because we need to know the datatype they eat) but at least we avoid too much code duplication?
I am also wondering if we could prepare things for "exclusive" jet finding (fixed number of jets).
Finally, if you do not mind, it would be great to use already now the "thread safe Gaudi Functionals": https://key4hep.github.io/key4hep-doc/developing-key4hep-software/WritingAlgorithms.html#gaudi-functional ? Compared to what you have here, only very minor changes are needed
Hi @BrieucF -- I think I've addressed the comments you made. I've started the functionality for exclusive jets, so it should be possible, but the default is to use inclusive jets for now. Let me know if you have any other comments, since this has changed quite a bit. There are still some features I'd like to add, but if there aren't any problems with this version, I might prefer to make a git issue for any new features so we can merge this in the meantime. |
Pinging @BrieucF to see if you have any more comments for this MR. |
Hi, adding @jroloff to the HEP-FCC GH organization in order for the CI to start automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful! Thanks Jennifer. I have a few comments but it looks good!
CreateTruthJet
and FilterTruthParticles
should probably not live in k4RecCalorimeter (k4Gen would be a more natural place) but I would understand if you prefer to deal with this in another PR or to leave it on us for a future code re-organization.
int index = constit.user_info<ClusterInfo>().index(); | ||
jetInput.setPDG((input)[index].getPDG()); | ||
|
||
jet.addToParticles(jetInput); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is'nt this jetInput
lost in the output file? One should probably store also this new collection of ReconstructedParticle in the output rootfile. Here is an example of MultiTransformer
: https://github.com/key4hep/k4FWCore/blob/main/test/k4FWCoreTest/src/components/ExampleFunctionalTransformerMultiple.cpp#L51.
I agree that this is not perfect, having a relation between "genJet" and MCParticle would feel more natural but for now there is no such thing AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another suggestion (thanks @tmadlener) is to use Association instead of relations (the difference is that associations are separated object with pointers to pairs of objects, used e.g. to link RECO to GEN quantities that should therefore not be used at analysis level). This will 1) avoid loosing information when casting MCParticles to ReconstructedParticles 2) avoid duplicating content on disk since the associations are pointers.
One should use MCRecoParticleAssociation
:
edm4hep::MCRecoParticleAssociation:
Description: "Association between a ReconstructedParticle and the corresponding MCParticle"
Author: "EDM4hep authors"
Members:
- float weight // weight of this association
OneToOneRelations:
- edm4hep::ReconstructedParticle rec // reference to the reconstructed particle
- edm4hep::MCParticle sim // reference to the Monte-Carlo particle
After defining the association object, you can set it with
association.setSim(MyMCParticle);
association.setRec(MyRecoParticle);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misunderstanding this, but can these associations be used for a one-to-many map, or just a one-to-one relation? It definitely seems better than what I currently have, but it's not obvious to me how this should work in my particular case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jennifer, very legit question :-) This may change in the future but for now, each entry in the association collection has a single pair. The way to store a "one to many" is therefore to add multiple entries, all having the same setRec(MyRecoParticle)
but different setSim(MyMCParticle)
. Storage wise it is ok. CPU wise we loose a bit at analysis stage, where retrieving all MCParticles associated to a GenJet will be done with:
for (const auto& assoc : associationColl) {
if (assoc.getRec() == MyRecoParticle) {
theMCParticle = rel.getSim();
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I guess this will only work for storing particles for truth jets, not for the calo jets? I'd prefer using the same solution for truth and calo jets, since I think this would make things very confusing otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I ran a quick test in python where I printed out the energies of the clusters in a jet, and it ran without any problem. For reference, here is the code:
`from podio import root_io
podio_reader = root_io.Reader('output_fullCalo_SimAndDigi.root')
for event in podio_reader.get("events"):
jets = event.get("Jets")
for jet in jets:
clusters = jet.getClusters()
for cluster in clusters:
print (cluster.getEnergy())`
So, I think this solution is sufficient for now, though it would be good to eventually have a solution that links the objects instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for the clusters attached to CaloJets
but my question was about the truthJets
: can we access the ReconstructedParticles that created it? By reading the code, it looks like we are missing the saving of a collection containing the jetInput
in the output rootfile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion, and you're right that it was broken for the truth jets. I've switched to using the associations for the truth jets, but not for the reco jets for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I think the reco jets are fine with the oneToMany association to Clusters.
But for the truthJets I think we should go for a mutlitransformer since we now have multiple output (not sure what will happen in multi-threaded environment in the current situation). The main change is that TruthJetParticleAssociation
should appear as a KeyValues
. Please find an example here: https://github.com/key4hep/k4FWCore/blob/main/test/k4FWCoreTest/src/components/ExampleFunctionalTransformerMultiple.cpp#L51
I tried to run RecCalorimeter/tests/options/runTruthJetReco.py
, adding
################ Output
from Configurables import PodioOutput
out = PodioOutput("out")
out.outputCommands = ["keep *"]
out.filename = "output_truthJets.root"
ApplicationMgr().TopAlg += [out]
but the truthJet collection is empty (and the association as well). Could we adapt the test so that it actually produces some truthJets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made both of these changes. I haven't added an output root file to the test, but I added it in temporarily to check that it produced truth jets, and that I was able to read the associations.
Co-authored-by: Brieuc Francois <[email protected]>
Thanks you very much for addressing the comments! After implementing the few remaining minor changes and fixing the tests, this is good to go! |
Hi, the failing tests |
Hi Jennifer, to avoid bothering you with more technicalities (you already did a great share), we will merge as is and refactor a few things in a later PR. Thanks again! |
This PR adds two algorithms for jet clustering: one for calorimeter clusters, and another for truth jets. Several options are included for the jet radius, jet input name, and more.