Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HoldingPen: Test matcher #3500

Open
ksachs opened this issue Jun 19, 2018 · 1 comment
Open

HoldingPen: Test matcher #3500

ksachs opened this issue Jun 19, 2018 · 1 comment
Assignees

Comments

@ksachs
Copy link
Contributor

ksachs commented Jun 19, 2018

to compare the result of the matcher with BibMatch with higher statistics we need to align the DESY lists (by feed / xml file) and the HP result.

For a few files it's OK to filter by publisher which is (to some extend) searchable.
Usually the publisher is in e.g. abstracts.source, but not all records have an abstract.
So this is not reliable. In addition there can be multiple feeds (different days or journals) submitted for the day, i.e. a cataloger has to look at several feeds to check whether there is a match.

Searching for the journal-title in the HP is too cumbersome unless there are facets.

It is not possible to search for the date of the harvest, which is OK as long as we clean the HP daily.

If BibMatch finds a match and the record is not halted in the holdingpen for matching one has to search manually record by record. If it is

  • found by DOI or arXiv: we would assume the matcher finds it too and don't search for the record in the holdingpen.
  • otherwise one has to search via DOI, e.g. metadata.dois.value.raw:"10.1103/PhysRevD.97.115023"
    if the record is halted due to conflicts: means there was an automatic match
    if the record is halted for selection: means there was no match
    but what if the record can not be found in the holdingpen (as in the example above?)

I doubt the current possibilities are enough to do thorough testing.
However, it is not worthwhile do develop anything fancy.

Can someone can come up with a solution for the holdingpen?
Or we switch to processing via holdingpen (after the open issues are fixed) and develop some cross-check based on the DESY workflow.

@ksachs ksachs added this to the Ingestion tools in PROD milestone Jun 19, 2018
@ksachs
Copy link
Contributor Author

ksachs commented Jun 22, 2018

When processing via holdingpen incl. updates to INSPIRE is active we will do a cross-check based on the DESY workflow.

@ksachs ksachs self-assigned this Jun 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant