-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Front matter for journals #181
Comments
It looks like this includes some but not all of the CL print issues. I think CL stopped printing around 2008, but these front matter scans seem only to go to 2001. Can we add an annotation in the XML that indicates when front matter exists and thus generate links for them? |
I think this heavily relates to #156. If all papers reliably had |
Can we maybe get a full PDF file list from the server (okay, *000 IDs would be sufficient for this case)? Then we could maybe add this information to the XML programmatically. |
Sent it to you. |
A summary of changes: - Introduces a nested format (closes acl-org#317) - URLs are stored using a relative format for internal links (closes acl-org#156), which facilitates mirroring (acl-org#295) - URLs are only displayed if they are found in the XML. I manually crawled to validate and create entries for PDFs for all frontmatter entries (closes acl-org#181 closes acl-org#180), including journal frontmatter (acl-org#264) and volume PDFs (closes #31) - Added missing entries and removed ones whose PDFs were missing, including LREC 2014 (closes #31 ) - It punts on C69 reformatting (closes acl-org#147) Relevant, but not completed: - Creating PDF volumes by pasting together individual papers (acl-org#226) - This makes it much easier to add non-paper entries such as talks (acl-org#298), to add a volume-level "publication date (acl-org#319), and to create an RSS feed of updates (acl-org#358),
The website (both old and new) has a rule that says journals (J and Q series) don't have front matter, and therefore don't generate paper entries for x000 IDs (e.g., there's no J14-1000, J14-2000 etc. in http://aclweb.org/anthology/events/cl-2014/).
However, I discovered by accident that some journal issues actually do have front matter stored as PDFs on the server, e.g.:
http://www.aclweb.org/anthology/J98-1000
These are currently inaccessible from the website.
The text was updated successfully, but these errors were encountered: