Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested volumes and explicit <url> tags #324

Merged
merged 50 commits into from
Jun 21, 2019
Merged

Nested volumes and explicit <url> tags #324

merged 50 commits into from
Jun 21, 2019

Conversation

mjpost
Copy link
Member

@mjpost mjpost commented May 9, 2019

Started consolidating URLs per #156. The goal is to:

  • Convert all href attributes to <url> tags
  • Convert all <href> tags to <url> tags
  • Use relative IDs for Anthology identifies (e.g., P18-1014 instead of https://www.aclweb.org/anthology/P18-1014)

Adding explicit <url> tags is easier with nested volumes (#317), so I am doing that, too:

  • Convert schemas to support nesting
  • Update all the XML
  • Add explicit full-volume <url> tags for volumes that are present on the server
  • Update the site generation code to work with the nested volumes

Additionally I think we should:

  • remove <bibtype> and <bibkey> tags, which are just clutter
  • Reformat all the XML to be properly indented (2 spaces? 2 spaces)

This will help with mirroring (#295 #28 #22).

Finally, I'd like to update the code not to generate URLs for papers without a URL tag, so by cross-checking against #264, we can solve a lot of problems (#226 #181 #180).

@mjpost
Copy link
Member Author

mjpost commented May 10, 2019

Another issue has come up. We now need to generate explicit volume <url> tags. This will be easier once #317 is done, so I will move to that next.

@mjpost mjpost changed the title Consolidate urls Nested volumes and explicit <url> tags May 14, 2019
@akoehn
Copy link
Member

akoehn commented Jun 17, 2019

Can you do a squash-merge to keep the changes more local? I am a big fan of rewriting history (in git at least ...) so that it is easier to see where a feature was introduced.

That is just my personal opinion of course.

@mjpost mjpost marked this pull request as ready for review June 17, 2019 21:38
@mjpost
Copy link
Member Author

mjpost commented Jun 17, 2019

Yes, will do a squash merge. This is a good idea.

@acl-org/anthology, this is ready for review. Here's a summary of changes:

Enabled, but yet to do:

@mjpost mjpost requested a review from a team June 17, 2019 23:21
@mbollmann
Copy link
Member

I want to build this locally and have a closer look at it later today.

@akoehn
Copy link
Member

akoehn commented Jun 18, 2019

@mbollmann: To have a look at the results (rather than the source), build master, mv build build_master, build this branch and compare with meld (or another diff tool). Tried this for other PRs and that workflow works quite well.

@akoehn
Copy link
Member

akoehn commented Jun 18, 2019

There is now a benoit_sagot and benoit_sagot1:

title: "Beno\u0131\u0302t Sagot"
title: "Beno\xEEt Sagot"

Maybe it was there before and now fixed in master but not yet in this PR?

@mjpost
Copy link
Member Author

mjpost commented Jun 18, 2019

@akoehn this kind of diffing is what I did. I did bulk diffs on .md and .bib files, and found only differences of this sort (which I think will be resolved when we do the final merge).

I have merged in master a few times but I think there may be one that's missing.

@davidweichiang
Copy link
Collaborator

Good catch. I think this was fixed in master, though. The issue is that this uses a dotless i and Unicode normalization doesn't know how to deal with that. I don't think there's anywhere in the code that fixes this automatically, so I'll create a new issue for this.

Copy link
Member

@mbollmann mbollmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty great overall, but also quite hard to diff :)

I focused on the code & templates now rather than the XML, and have some (minor?) comments.

hugo/layouts/events/single.html Outdated Show resolved Hide resolved
hugo/layouts/papers/list-entry.html Outdated Show resolved Hide resolved
hugo/layouts/papers/single.html Outdated Show resolved Hide resolved
hugo/layouts/volumes/single.html Outdated Show resolved Hide resolved
bin/anthology/anthology.py Outdated Show resolved Hide resolved
bin/anthology/volumes.py Show resolved Hide resolved
@mjpost mjpost requested a review from mbollmann June 20, 2019 00:12
hugo/layouts/papers/single.html Outdated Show resolved Hide resolved
@mjpost mjpost mentioned this pull request Jun 20, 2019
@mjpost mjpost merged commit 0b4ea37 into master Jun 21, 2019
@akoehn
Copy link
Member

akoehn commented Jun 21, 2019

Congratulations on this huge merge!

@mjpost mjpost deleted the consolidate_urls branch June 26, 2019 15:10
najtin pushed a commit to ir-anthology/ir-anthology that referenced this pull request Jun 9, 2021
A summary of changes:

- Introduces a nested format (closes acl-org#317)
- URLs are stored using a relative format for internal links (closes acl-org#156), which facilitates mirroring (acl-org#295) 
- URLs are only displayed if they are found in the XML. I manually crawled to validate and create entries for PDFs for all frontmatter entries (closes acl-org#181 closes acl-org#180), including journal frontmatter (acl-org#264) and volume PDFs (closes #31) 
- Added missing entries and removed ones whose PDFs were missing, including LREC 2014 (closes #31 )
- It punts on C69 reformatting (closes acl-org#147)

Relevant, but not completed:
- Creating PDF volumes by pasting together individual papers (acl-org#226)
- This makes it much easier to add non-paper entries such as talks (acl-org#298), to add a volume-level "publication date (acl-org#319), and to create an RSS feed of updates (acl-org#358),
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants