Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GlobalJobId as unique job identifier #426

Merged
merged 2 commits into from
Feb 18, 2021
Merged

Conversation

drebatto
Copy link
Contributor

Instead of making up a custom LRMS job identifier, use the Condor-defined GlobalJobId.
This doesn't introduce arbitrary convention on the JobId format.
It also allows using the field directly in condor_history constraints.

Instead of making up a custom LRMS job identifier, use the ready-made GlobalJobId.
@brianhlin
Copy link
Collaborator

Hi @drebatto thanks for the contribution! I'll have to admit that I'm not familiar with all of the APEL accounting rules. Are # characters valid in APEL records? Also, IIRC, the blah lrmsID needs to match the batch job ID to create the combined APEL record so I think you'd need to make a similar change to condor_blah.sh

@maxfischer2781
Copy link
Contributor

The same ID is generated in condor_blah.sh indeed:

-format "\"lrmsID=%v" clusterId \
-format ".%v_${BATCH_HOST}\" " ProcId \

Both need to match, since APEL joins based on these:

    INNER JOIN BlahdRecords ON BlahdRecords.ValidFrom <= EventRecords.EndTime AND
                             BlahdRecords.ValidUntil > EventRecords.EndTime AND
                             EventRecords.SiteID = BlahdRecords.SiteID AND EventRecords.JobName = BlahdRecords.LrmsId
                             AND SpecRecords.SiteID = BlahdRecords.SiteID AND SpecRecords.CEID = BlahdRecords.CEID

@drebatto
Copy link
Contributor Author

Yes, you are right. I modified condor_blah.sh too.
By the way, I implemented this patch because this is how we accounted for jobs with CREAM in the last six or seven years, and I wanted historical data to look consistent. For the same reason, I'll understand if you don't want to merge the patch and change the job id format for existing instances.

@maxfischer2781
Copy link
Contributor

I think this is a good change, even if it conflicts with the earlier format. The current format is not unique if a machine is re-installed, thereby resetting the clusterId counter. Since the GlobalJobId includes the QDate, it stays unique in this case.

Copy link
Collaborator

@brianhlin brianhlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants