-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
L18 missing ELRA/LREC native IDs #823
Comments
It would be fine to link to the LREC data (feel free to submit a PR to expedite, it's the LREC 20 ingestion is imminent. There has been some confusion and difficulty given LREC's size (30 workshops) and the new ID format. |
But isn't it still preferable for the Anthology to host files whenever possible? It would help with issues like #812. If we want to add external links in addition to hosting the files ourselves, we'd need to add support for that in the XML. |
We always have them internally, but currently sometimes (I think maybe just for LREC) link to the PDFs externally, per request. In such cases I agree it'd be a good idea to provide both links. |
Maybe I missed it in a doc, but what's the difference between |
The XML only knows On the website, "URL" is intended to be the canonical link (the paper's landing page, usually) while "PDF" is the paper PDF itself. The semantics of these were changed following the discussion in #587. (I see they're identical for the externally-hosted papers, which I'm not sure is ideal...) |
Given that externally-hosted material can be unreliable (see #812 ) and that the Anthology stores PDFs locally; and also that external URLs can be useful for e.g. disambiguation's sake; and that it's current practice to separate Anthology landing page URL from PDF; doesn't this mean that there are up to three URLs that make sense to store (pdf, anthology url, external url), but only two fields? I can see that doi can take the role of external URL for some content (e.g. CL papers, old things in the ACM DL), but for external content that does have a different ID and source URL, but no DOI (e.g. RANLP, LREC, NODALIDA, and surely others), what could be done? An external_uri field is one solution. |
The Anthology PDF is inferable from its URL, so we only store it once. I like the idea of adding an external or "original" URL, which would ideally point to a landing page, but could just point to a PDF, too, if that's all that's available. |
The
url
andpdf
fields in L18 both point to the Anthology PDF. For prior LRECs, these both pointed to the LREC-hosted PDF, which while not without issues did permit syncing up of other metadata across the sites. It might be easier, and avoid duplication, if both ACL and LREC paper IDs were listed in the metadata. Oh, and isn't L20 just around the corner?(also not a correction, sorry)
The text was updated successfully, but these errors were encountered: