Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Front matter for journals #181

Closed
mbollmann opened this issue Mar 15, 2019 · 4 comments
Closed

Front matter for journals #181

mbollmann opened this issue Mar 15, 2019 · 4 comments

Comments

@mbollmann
Copy link
Member

The website (both old and new) has a rule that says journals (J and Q series) don't have front matter, and therefore don't generate paper entries for x000 IDs (e.g., there's no J14-1000, J14-2000 etc. in http://aclweb.org/anthology/events/cl-2014/).

However, I discovered by accident that some journal issues actually do have front matter stored as PDFs on the server, e.g.:
http://www.aclweb.org/anthology/J98-1000

These are currently inaccessible from the website.

@mjpost
Copy link
Member

mjpost commented Mar 15, 2019

It looks like this includes some but not all of the CL print issues. I think CL stopped printing around 2008, but these front matter scans seem only to go to 2001.

Can we add an annotation in the XML that indicates when front matter exists and thus generate links for them?

@mbollmann
Copy link
Member Author

I think this heavily relates to #156. If all papers reliably had <url> tags (or similar) specifying a URL or internal filename for the PDF, the journals without front matter could simply not have those.

@mbollmann
Copy link
Member Author

Can we maybe get a full PDF file list from the server (okay, *000 IDs would be sufficient for this case)? Then we could maybe add this information to the XML programmatically.

@mjpost
Copy link
Member

mjpost commented Mar 29, 2019

Sent it to you.

@mjpost mjpost closed this as completed in 0b4ea37 Jun 21, 2019
najtin pushed a commit to ir-anthology/ir-anthology that referenced this issue Jun 9, 2021
A summary of changes:

- Introduces a nested format (closes acl-org#317)
- URLs are stored using a relative format for internal links (closes acl-org#156), which facilitates mirroring (acl-org#295) 
- URLs are only displayed if they are found in the XML. I manually crawled to validate and create entries for PDFs for all frontmatter entries (closes acl-org#181 closes acl-org#180), including journal frontmatter (acl-org#264) and volume PDFs (closes #31) 
- Added missing entries and removed ones whose PDFs were missing, including LREC 2014 (closes #31 )
- It punts on C69 reformatting (closes acl-org#147)

Relevant, but not completed:
- Creating PDF volumes by pasting together individual papers (acl-org#226)
- This makes it much easier to add non-paper entries such as talks (acl-org#298), to add a volume-level "publication date (acl-org#319), and to create an RSS feed of updates (acl-org#358),
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants