Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dblp indexing of ACL publications #468

Open
florianReitz-tr opened this issue Jul 18, 2019 · 3 comments
Open

dblp indexing of ACL publications #468

florianReitz-tr opened this issue Jul 18, 2019 · 3 comments

Comments

@florianReitz-tr
Copy link

This is originally from a mail to Matt Post. Matt suggested that I put the core of the mail here to allow a broader discussion.

Dblp (https://dblp.org) indexes parts of the ACL anthology. We are currently considering how we can improve the way we handle your data to avoid problems we had in the past (completeness, timeliness ...). In general, there are thee areas that are interesting for us and I hope you can give us some feedback, clues on how to handle them best.

Availability of metadata: We need to get the actual data. We are currently switching to your MODS export as primary data source. According to Matt, this preferable to working with the raw XML data in the github repo directly.

Monitoring for new proceedings: Our primary issue is that we have is to determine when you publish a new proceedings. This is not a big deal for the large conferences (ACL, NAACL...) but more difficult for the smaller conferences and workshops. Currently, we scan the list provided at https://www.aclweb.org/anthology/volumes/ for anything that has been added since we last checked. We are not sure if this is a good idea or if we miss proceedings this way. If there is a better/more reliable way to get notification on new proceedings we would appreciate the information.

Stability of links: This relates in particular to the links that we use to point your users to your publication landing pages (or pdfs). Matt pointed me to issue #158 and to the canonical URL form https://www.aclweb.org/anthology/CYY-VPPP . I noticed that the canonical URL points to a landing page while the DOIs (at least for ACL 2017) point to the pdf itself. In case a DOI is available, what should be the primary url in dblp (the one that 99% of our users will click when they are looking for the publication)?

Any hint/help/comment on dealing with these issues would be greatly appreciated.

@akoehn
Copy link
Member

akoehn commented Jul 18, 2019

Regarding the links:
The canonical URL actually points to the PDF, the canonical URL with a slash at the and points to the landing page:
https://www.aclweb.org/anthology/C18-1253 -- PDF
https://www.aclweb.org/anthology/C18-1253/ -- landing page

I would usually prefer to refer to the landing page as it is easy to go from there to the PDF but hard the other way round (unless you know about adding a slash). This means the anthology URL and not the DOI would need to be the primary URL.

The anthology really needs some RSS feed to notify people whenever new publications have been ingested. This is tracked in #358.

@mjpost
Copy link
Member

mjpost commented Aug 19, 2019

We are currently discussing (#480) making the canonical page the paper landing page. It is likely that will happen sometime in the next month or so. We have some progress on #513.

@mjpost
Copy link
Member

mjpost commented Oct 1, 2019

Hi @florianReitz-tr—Just to update you, the canonical URLs now point to the landing page. PDFs and associated files require an extension. For example:

etc.

I am hoping to have an RSS feed soon for new volumes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants