Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding tools for jet clustering for cluster and truth particles #80

Merged
merged 27 commits into from
Jul 9, 2024

Conversation

jroloff
Copy link
Contributor

@jroloff jroloff commented May 8, 2024

This PR adds two algorithms for jet clustering: one for calorimeter clusters, and another for truth jets. Several options are included for the jet radius, jet input name, and more.

Copy link
Contributor

@BrieucF BrieucF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jennifer, great thanks! Here are already a few comments/suggestions/questions.

Now I see why you proposed to have a general algorithm that does both caloJet and TruthJets. Maybe one way could be to have a separated helper class that does only the fastJet part. We would still have two Gaudi components (because we need to know the datatype they eat) but at least we avoid too much code duplication?

I am also wondering if we could prepare things for "exclusive" jet finding (fixed number of jets).

Finally, if you do not mind, it would be great to use already now the "thread safe Gaudi Functionals": https://key4hep.github.io/key4hep-doc/developing-key4hep-software/WritingAlgorithms.html#gaudi-functional ? Compared to what you have here, only very minor changes are needed

RecFCCeeCalorimeter/CMakeLists.txt Outdated Show resolved Hide resolved
RecFCCeeCalorimeter/src/components/JetFCCee.cpp Outdated Show resolved Hide resolved
RecFCCeeCalorimeter/src/components/JetFCCee.cpp Outdated Show resolved Hide resolved
RecFCCeeCalorimeter/src/components/JetFCCee.cpp Outdated Show resolved Hide resolved
RecFCCeeCalorimeter/src/components/TruthJetFCCee.cpp Outdated Show resolved Hide resolved
RecFCCeeCalorimeter/src/components/TruthJetFCCee.cpp Outdated Show resolved Hide resolved
@jroloff
Copy link
Contributor Author

jroloff commented May 31, 2024

Hi @BrieucF -- I think I've addressed the comments you made. I've started the functionality for exclusive jets, so it should be possible, but the default is to use inclusive jets for now. Let me know if you have any other comments, since this has changed quite a bit.

There are still some features I'd like to add, but if there aren't any problems with this version, I might prefer to make a git issue for any new features so we can merge this in the meantime.

@jroloff
Copy link
Contributor Author

jroloff commented Jun 19, 2024

Pinging @BrieucF to see if you have any more comments for this MR.

@kjvbrt
Copy link
Contributor

kjvbrt commented Jun 24, 2024

Hi, adding @jroloff to the HEP-FCC GH organization in order for the CI to start automatically.

Copy link
Contributor

@BrieucF BrieucF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful! Thanks Jennifer. I have a few comments but it looks good!

CreateTruthJet and FilterTruthParticles should probably not live in k4RecCalorimeter (k4Gen would be a more natural place) but I would understand if you prefer to deal with this in another PR or to leave it on us for a future code re-organization.

RecCalorimeter/src/components/CreateCaloJet.h Outdated Show resolved Hide resolved
RecCalorimeter/src/components/CreateCaloJet.h Show resolved Hide resolved
RecCalorimeter/src/components/FilterTruthParticles.h Outdated Show resolved Hide resolved
RecCalorimeter/src/components/FilterTruthParticles.cpp Outdated Show resolved Hide resolved
RecCalorimeter/src/components/FilterTruthParticles.cpp Outdated Show resolved Hide resolved
int index = constit.user_info<ClusterInfo>().index();
jetInput.setPDG((input)[index].getPDG());

jet.addToParticles(jetInput);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is'nt this jetInput lost in the output file? One should probably store also this new collection of ReconstructedParticle in the output rootfile. Here is an example of MultiTransformer: https://github.com/key4hep/k4FWCore/blob/main/test/k4FWCoreTest/src/components/ExampleFunctionalTransformerMultiple.cpp#L51.

I agree that this is not perfect, having a relation between "genJet" and MCParticle would feel more natural but for now there is no such thing AFAIK.

Copy link
Contributor

@BrieucF BrieucF Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another suggestion (thanks @tmadlener) is to use Association instead of relations (the difference is that associations are separated object with pointers to pairs of objects, used e.g. to link RECO to GEN quantities that should therefore not be used at analysis level). This will 1) avoid loosing information when casting MCParticles to ReconstructedParticles 2) avoid duplicating content on disk since the associations are pointers.

One should use MCRecoParticleAssociation:

edm4hep::MCRecoParticleAssociation:
    Description: "Association between a ReconstructedParticle and the corresponding MCParticle"
    Author: "EDM4hep authors"
    Members:
      - float weight                        // weight of this association
    OneToOneRelations:
     - edm4hep::ReconstructedParticle  rec  // reference to the reconstructed particle
     - edm4hep::MCParticle sim              // reference to the Monte-Carlo particle

After defining the association object, you can set it with

        association.setSim(MyMCParticle);
        association.setRec(MyRecoParticle);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding this, but can these associations be used for a one-to-many map, or just a one-to-one relation? It definitely seems better than what I currently have, but it's not obvious to me how this should work in my particular case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jennifer, very legit question :-) This may change in the future but for now, each entry in the association collection has a single pair. The way to store a "one to many" is therefore to add multiple entries, all having the same setRec(MyRecoParticle) but different setSim(MyMCParticle). Storage wise it is ok. CPU wise we loose a bit at analysis stage, where retrieving all MCParticles associated to a GenJet will be done with:

for (const auto& assoc : associationColl) {
  if (assoc.getRec() == MyRecoParticle) {
    theMCParticle = rel.getSim();
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I guess this will only work for storing particles for truth jets, not for the calo jets? I'd prefer using the same solution for truth and calo jets, since I think this would make things very confusing otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I ran a quick test in python where I printed out the energies of the clusters in a jet, and it ran without any problem. For reference, here is the code:

`from podio import root_io

podio_reader = root_io.Reader('output_fullCalo_SimAndDigi.root')

for event in podio_reader.get("events"):
jets = event.get("Jets")
for jet in jets:
clusters = jet.getClusters()
for cluster in clusters:
print (cluster.getEnergy())`

So, I think this solution is sufficient for now, though it would be good to eventually have a solution that links the objects instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for the clusters attached to CaloJets but my question was about the truthJets: can we access the ReconstructedParticles that created it? By reading the code, it looks like we are missing the saving of a collection containing the jetInput in the output rootfile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion, and you're right that it was broken for the truth jets. I've switched to using the associations for the truth jets, but not for the reco jets for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I think the reco jets are fine with the oneToMany association to Clusters.

But for the truthJets I think we should go for a mutlitransformer since we now have multiple output (not sure what will happen in multi-threaded environment in the current situation). The main change is that TruthJetParticleAssociation should appear as a KeyValues. Please find an example here: https://github.com/key4hep/k4FWCore/blob/main/test/k4FWCoreTest/src/components/ExampleFunctionalTransformerMultiple.cpp#L51

I tried to run RecCalorimeter/tests/options/runTruthJetReco.py, adding

################ Output
from Configurables import PodioOutput
out = PodioOutput("out")
out.outputCommands = ["keep *"]
out.filename = "output_truthJets.root"
ApplicationMgr().TopAlg += [out]

but the truthJet collection is empty (and the association as well). Could we adapt the test so that it actually produces some truthJets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made both of these changes. I haven't added an output root file to the test, but I added it in temporarily to check that it produced truth jets, and that I was able to read the associations.

RecFCCeeCalorimeter/CMakeLists.txt Outdated Show resolved Hide resolved
RecCalorimeter/src/components/ClusterJet.h Outdated Show resolved Hide resolved
RecCalorimeter/src/components/ClusterJet.h Outdated Show resolved Hide resolved
@BrieucF
Copy link
Contributor

BrieucF commented Jul 2, 2024

Thanks you very much for addressing the comments! After implementing the few remaining minor changes and fixing the tests, this is good to go!

@kjvbrt
Copy link
Contributor

kjvbrt commented Jul 2, 2024

Hi, the failing tests FCCeeLAr_simRecoAllegroV3 and FCCeeLAr_reproduceSegfault are already fixed in the main.

@BrieucF BrieucF changed the title Draft: Adding tools for jet clustering for cluster and truth particles Adding tools for jet clustering for cluster and truth particles Jul 2, 2024
@BrieucF
Copy link
Contributor

BrieucF commented Jul 9, 2024

Hi Jennifer, to avoid bothering you with more technicalities (you already did a great share), we will merge as is and refactor a few things in a later PR.

Thanks again!

@BrieucF BrieucF merged commit 5f27a43 into HEP-FCC:main Jul 9, 2024
0 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants