Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author/editor id attribute #297

Merged
merged 14 commits into from
May 8, 2019
Merged

Author/editor id attribute #297

merged 14 commits into from
May 8, 2019

Conversation

davidweichiang
Copy link
Collaborator

@davidweichiang davidweichiang commented May 3, 2019

The initial commit here converts the complete attribute to an id attribute.

  • The intent is that the author of a paper is identified as proposed in Name collisions #208: If the author tag has an id attribute, the id is looked up in name_variants.yaml; else, the first and last name are looked up in name_variants.yaml.
  • So far, ids are used only if the previous version used the complete attribute. There's no resolving of name collisions (e.g., Yang Liu) yet.
  • I went with the slug of the canonical name as the default id. There are currently no slug collisions, so no need for number suffixes (yet). The canonical name is usually the most frequently used name, but in some cases, in order to disambiguate, or sometimes for no reason, I've changed it so the canonical name is the most specific name. If another convention for choosing ids is desired, now would be a good time to decide that.

To do:

  • check all these initially-generated ids to see if they are reasonable
  • check and enforce the constraint that if a first+last name is ever used with an id, it should always be used with an id
  • allow entries in name_variants.yaml to have colliding canonical/variant names if they both have ids
  • author's canonical slug (i.e., the one used in the URL for their page) should be same as id if it exists (?)
  • generation of author links should use ids
  • add ids to Yang Liu's paper as (the principal) test case

@davidweichiang davidweichiang mentioned this pull request May 6, 2019
…a slug and people entry for each variant spelling, represent variant spellings on papers as strings. This will (hopefully) simplify modifying logic to use explicit person ids in XML.
…w warns of unused variants; remove unused variants
…lved some reorganization of the data structures in AnthologyIndex.) #208, also fixes #305.
Enforce constraint that unambiguous names can't have ids.
@davidweichiang davidweichiang marked this pull request as ready for review May 7, 2019 03:12
@davidweichiang
Copy link
Collaborator Author

Assuming this passes, I think this is ready. But it's a large change and should be reviewed carefully.

@davidweichiang
Copy link
Collaborator Author

Hm, I didn’t get those errors locally and will take a closer look. But the above build is a good example of the “always or never use id” constraint in action. It’s picky but I think it helps to catch errors.

@mjpost
Copy link
Member

mjpost commented May 7, 2019

Those are some awesomely detailed error messages. I’ll take a careful look soon.

@mjpost mjpost mentioned this pull request May 8, 2019
@davidweichiang
Copy link
Collaborator Author

OK, this passes and I think I'm done.

@mjpost
Copy link
Member

mjpost commented May 8, 2019

Great—will review tomorrow.

@mjpost
Copy link
Member

mjpost commented May 8, 2019

I looked over the code a bit and it looks fine to me, though it would benefit from a glance by someone more familiar with its intricacies.

I did pull it down and try to build it. The Python preprocessing all goes well, but I get this error when running hugo server (in lieu of hugo cleanDestinationDir --minify). Is this enough information to make sense of?

$ hugo server
Building sites … ERROR 2019/05/08 14:01:22 [en] REF_NOT_FOUND: Ref "/people/i/i-aldezabal.md" from page "/Users/post/code/acl-anthology/hugo/content/people/0/0-ansa.md": page not found
ERROR 2019/05/08 14:01:22 [en] REF_NOT_FOUND: Ref "/people/i/i-alegria.md" from page "/Users/post/code/acl-anthology/hugo/content/people/0/0-ansa.md": page not found
ERROR 2019/05/08 14:01:22 [en] REF_NOT_FOUND: Ref "/people/j/j-m-arriola.md" from page "/Users/post/code/acl-anthology/hugo/content/people/0/0-ansa.md": page not found
ERROR 2019/05/08 14:01:22 [en] REF_NOT_FOUND: Ref "/people/n/n-ezeiza.md" from page "/Users/post/code/acl-anthology/hugo/content/people/0/0-ansa.md": page not found

@davidweichiang
Copy link
Collaborator Author

I think this might be related to #316, but what's 0-ansa.md?

@mjpost
Copy link
Member

mjpost commented May 8, 2019

Had the same thought, am trying to clean up and rebuild...

@mjpost
Copy link
Member

mjpost commented May 8, 2019

(It would be nice if all the generated files were written to a build directory so that cleanup could be done by just rm -rfing a single directory.)

@mjpost
Copy link
Member

mjpost commented May 8, 2019

Okay, that fixed things for me. I'm going to merge.

@mjpost mjpost merged commit c14a9ca into master May 8, 2019
@mjpost mjpost deleted the author-ids branch May 8, 2019 18:55
najtin pushed a commit to ir-anthology/ir-anthology that referenced this pull request Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants