Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper labeling in C69.xml #147

Closed
mjpost opened this issue Feb 20, 2019 · 3 comments
Closed

Paper labeling in C69.xml #147

mjpost opened this issue Feb 20, 2019 · 3 comments
Assignees

Comments

@mjpost
Copy link
Member

mjpost commented Feb 20, 2019

The C69 issue is a consequence of faulty XML, in my opinion. In the database import script, IDs of the form "x000" get interpreted as proceedings volumes, except for workshops, where it is IDs of the form "xx00". In C69.xml, volumes follow the workshop format, which leads to the first nine papers being ignored until a "proper" volume ID (1000) is found.

However, IMHO, the actual issue is that each paper has its own proceedings entry, which doesn't seem correct or useful to me. I believe the file should have a single proceedings entry with ID 1000, and the individual papers should be renumbered 0101 -> 1001, 0201 -> 1002, 0301 -> 1003, and so on.

Originally posted by @mbollmann in #107 (comment)

@davidweichiang
Copy link
Collaborator

The index page for C69 (https://www.aclweb.org/anthology/events/coling-1969/) is display weirdly, with many papers repeated and many papers omitted. I know that the numbering is wrong, but even so, should they be displaying like this?

@mbollmann mbollmann self-assigned this Apr 2, 2019
@mjpost
Copy link
Member Author

mjpost commented Apr 12, 2019

C69 just needs to be reorganized. Each paper has two entries: one for the preprint (an abstract), and the other for the paper (e.g., https://aclweb.org/anthology/C69-0100.pdf and https://aclweb.org/anthology/C69-0101.pdf).

I suggest that we

  • renumber papers from numbers 1001..1071, deleting the preprints from direct access
  • concatenate all papers into a volume, which will contain preprints and other items

So for example:

<volume id="C69">
 <paper id="0100">
 <title>INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS COLING 1969: Preprint No. 1</title>
 </paper>

 <paper id="0101">
 <title>TREE GRAMMARS (= Δ-GRAMMARS)</title>
 <author><last>Mel’čuk</last><first>I. A.</first></author>
 <author><last>Gladky</last><first>A. V.</first></author>
</paper>

would become

<volume id="C69">
 <paper id="1001">
 <title>TREE GRAMMARS (= Δ-GRAMMARS)</title>
 <author><last>Mel’čuk</last><first>I. A.</first></author>
 <author><last>Gladky</last><first>A. V.</first></author>
</paper>

Thoughts? CC: @villalbamartin @danielgildea @mbollmann @davidweichiang

@mjpost
Copy link
Member Author

mjpost commented Apr 12, 2019

Update: there are also a number of "post-prints", which appear to be commentary after the conference, a fascinating idea that should be revived. Also this document ("Die Mälarinseln und ihre Sehenswürdigkeiten Allgemeines über die Gegend) for @mbollmann. I would retain entries for them and also fold them into the full proceedings.

This was referenced Jun 13, 2019
@mjpost mjpost closed this as completed in 0b4ea37 Jun 21, 2019
najtin pushed a commit to ir-anthology/ir-anthology that referenced this issue Jun 9, 2021
najtin pushed a commit to ir-anthology/ir-anthology that referenced this issue Jun 9, 2021
A summary of changes:

- Introduces a nested format (closes acl-org#317)
- URLs are stored using a relative format for internal links (closes acl-org#156), which facilitates mirroring (acl-org#295) 
- URLs are only displayed if they are found in the XML. I manually crawled to validate and create entries for PDFs for all frontmatter entries (closes acl-org#181 closes acl-org#180), including journal frontmatter (acl-org#264) and volume PDFs (closes #31) 
- Added missing entries and removed ones whose PDFs were missing, including LREC 2014 (closes #31 )
- It punts on C69 reformatting (closes acl-org#147)

Relevant, but not completed:
- Creating PDF volumes by pasting together individual papers (acl-org#226)
- This makes it much easier to add non-paper entries such as talks (acl-org#298), to add a volume-level "publication date (acl-org#319), and to create an RSS feed of updates (acl-org#358),
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants