Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L18 missing ELRA/LREC native IDs #823

Open
leondz opened this issue May 18, 2020 · 7 comments
Open

L18 missing ELRA/LREC native IDs #823

leondz opened this issue May 18, 2020 · 7 comments
Assignees

Comments

@leondz
Copy link
Contributor

leondz commented May 18, 2020

The url and pdf fields in L18 both point to the Anthology PDF. For prior LRECs, these both pointed to the LREC-hosted PDF, which while not without issues did permit syncing up of other metadata across the sites. It might be easier, and avoid duplication, if both ACL and LREC paper IDs were listed in the metadata. Oh, and isn't L20 just around the corner?

(also not a correction, sorry)

@leondz leondz added the correction for corrections submitted to the anthology label May 18, 2020
@mjpost
Copy link
Member

mjpost commented May 19, 2020

It would be fine to link to the LREC data (feel free to submit a PR to expedite, it's the <url> field in the <meta> block in data/xml/L18.xml, which needs to be changed from the Anth ID to a fully-specified URL).

LREC 20 ingestion is imminent. There has been some confusion and difficulty given LREC's size (30 workshops) and the new ID format.

@mbollmann
Copy link
Member

It would be fine to link to the LREC data (feel free to submit a PR to expedite, it's the <url> field in the <meta> block in data/xml/L18.xml, which needs to be changed from the Anth ID to a fully-specified URL).

But isn't it still preferable for the Anthology to host files whenever possible? It would help with issues like #812.

If we want to add external links in addition to hosting the files ourselves, we'd need to add support for that in the XML.

@mjpost
Copy link
Member

mjpost commented May 19, 2020

We always have them internally, but currently sometimes (I think maybe just for LREC) link to the PDFs externally, per request. In such cases I agree it'd be a good idea to provide both links.

@leondz
Copy link
Contributor Author

leondz commented May 19, 2020

Maybe I missed it in a doc, but what's the difference between url and pdf fields?

@mbollmann
Copy link
Member

The XML only knows url, pointing to the PDF.

On the website, "URL" is intended to be the canonical link (the paper's landing page, usually) while "PDF" is the paper PDF itself. The semantics of these were changed following the discussion in #587. (I see they're identical for the externally-hosted papers, which I'm not sure is ideal...)

@leondz
Copy link
Contributor Author

leondz commented May 20, 2020

Given that externally-hosted material can be unreliable (see #812 ) and that the Anthology stores PDFs locally; and also that external URLs can be useful for e.g. disambiguation's sake; and that it's current practice to separate Anthology landing page URL from PDF; doesn't this mean that there are up to three URLs that make sense to store (pdf, anthology url, external url), but only two fields? I can see that doi can take the role of external URL for some content (e.g. CL papers, old things in the ACM DL), but for external content that does have a different ID and source URL, but no DOI (e.g. RANLP, LREC, NODALIDA, and surely others), what could be done? An external_uri field is one solution.

@mjpost
Copy link
Member

mjpost commented May 23, 2020

The Anthology PDF is inferable from its URL, so we only store it once. I like the idea of adding an external or "original" URL, which would ideally point to a landing page, but could just point to a PDF, too, if that's all that's available.

@mjpost mjpost removed the correction for corrections submitted to the anthology label Jul 8, 2020
@mjpost mjpost mentioned this issue Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants