Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

author urls in style name/id #1179

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

author urls in style name/id #1179

wants to merge 7 commits into from

Conversation

danielgildea
Copy link
Collaborator

Issue #623

Now generates author pages with urls in form name/id

for most people this looks like:
people/d/david-chiang/david-chiang/

Matt Post has an ORCID in name_variants.yaml, so his page is:
people/m/matt-post/0000-0002-1297-6794/

and then there is:
people/y/yang-liu/yang-liu-edinburgh/
people/y/yang-liu/yang-liu-ict/
people/y/yang-liu/yang-liu-icsi/
people/y/yang-liu/yang-liu-umich/

I don't know how to make the old URLs people/m/matt-post/ resolve.

Daniel Gildea added 2 commits January 3, 2021 11:23
Issue #623

Now generates author pages with urls in form name/id

for most people this looks like:
 people/d/david-chiang/david-chiang/

Matt Post has an ORCID in name_variants.yaml, so his page is:
 people/m/matt-post/0000-0002-1297-6794/

and then there is:
 people/y/yang-liu/yang-liu-edinburgh/
 people/y/yang-liu/yang-liu-ict/
 people/y/yang-liu/yang-liu-icsi/
 people/y/yang-liu/yang-liu-umich/

I don't know how to make the old URLs people/m/matt-post/ resolve.
@mjpost
Copy link
Member

mjpost commented Jan 11, 2021

Thanks, I'll take a look soon!

@mjpost mjpost requested a review from a team January 12, 2021 02:55
@mjpost
Copy link
Member

mjpost commented Jan 12, 2021

Some thoughts:

  • I think we should move to the base author page being at /people/matt-post instead of /people/m/matt-post. There's no reason for the intervening letter any more (echoing an earlier conversation). We can have hugo dump all files in one directory and maintain the old longer form with 301 redirects in the .htaccess file.
  • I don't love the look of /people/david-chiang/david-chiang. I think the top-level should be the page for (a) pointing to all the disambiguated names and (b) unresolved names. I see your comment about not being sure how to do this, one of us will have to figure it out.
  • We should come up with a prioritization scheme for IDs. For example, my ORCID is entered, but what if before that someone had created matt-post-rochester? It would be nice for that to redirect (as a 301) to /people/matt-post/{ORCID}, for backwards compatibility.
  • Separately, the name_variants.yaml files is getting a bit unwieldy IMO (for example, I dislike editing it, and creating new entries and putting them in the correct sorted place manually). I wonder if we should move to a directory data/yaml/people/ and then have a separate file for every canonical name.

@davidweichiang
Copy link
Collaborator

I definitely agree about dropping the first letter.

I agree that name_variants.yaml should be split up into lots of files; it's really an author database now and not just name variants.

@mjpost
Copy link
Member

mjpost commented Jan 27, 2021

Sorry that I'm behind on this—I will catch up next week!

@akoehn
Copy link
Member

akoehn commented Apr 6, 2021

I thought this would be a good testbed for the previews.

@mjpost
Copy link
Member

mjpost commented Apr 6, 2021

Oh, yes, good call!

@github-actions
Copy link

github-actions bot commented Apr 6, 2021

Build successful. You can preview it here: https://aclanthology.org/previews/author-url

@mjpost
Copy link
Member

mjpost commented Apr 6, 2021

Some TODOs:

  • Deconstitute the name_variants.yaml file
  • Build author pages directly under people/
  • Add 301 links to .htaccess file redirecting /people/m/matt-post/people/matt-post/

@mjpost
Copy link
Member

mjpost commented Apr 7, 2021

Some thoughts after perusing the build preview:

  • I think the base name page should always be for disambiguation (pointing to the people that share that surface form) and for unclaimed / uncategorized names
  • We will have a "identification" process, whereby people can identify themselves. This is the same thing as disambiguating themselves, except that we will try to do identification for all names, not just ambiguous ones
  • A person's real page will therefore be under /people/matt-post/{IDENTIFIER}
  • We should support multiple IDs for people: ORCID, a custom Anthology ID for backward-compat, maybe a start ID.
  • We'll have a canonical identifier, that the other identifiers will redirect to. I suggest this be the ORCID, and that we coordinate with upstream conference management systems to have this added to the ingestion data.
  • For example, /people/matt-post/{ORCID}, /people/matt-post/startid:post could all point to the same place

@akoehn
Copy link
Member

akoehn commented Apr 7, 2021

I suggest this be the ORCID, and that we coordinate with upstream conference management systems to have this added to the ingestion data.

Strongly agree and since we already know that we want to implement it, we (that is probably you) should already start requesting the inclusion of ORCID into the datasets.

@mjpost
Copy link
Member

mjpost commented Apr 7, 2021

This simplest thing technically would be to add an ORCID field to their Softconf profiles, and to force them to do this prior to submission (and possibly final copies, which would let us get data from NAACL and ACL). Probably we can't force all authors to do this, but we could force the submitting author to do it.

Do you know if there are any downsides to ORCID? For example, maybe it's not available in China?

This will likely require coordination between us, Softconf, and the ACL Exec. I'm not sure whether everyone can move fast enough, but I'll get on it.

@akoehn
Copy link
Member

akoehn commented Apr 7, 2021 via email

@knmnyn
Copy link
Collaborator

knmnyn commented Apr 7, 2021 via email

@bastings
Copy link
Contributor

+1000 to using ORCiD :)

And just in case: names can change, also (especially) at ORCiD, and it's probably worth thinking about how to handle that. I'll start a separate discussion with ACL exec for this in general in which you're very welcome to participate.

@mjpost
Copy link
Member

mjpost commented May 10, 2021

Okay, Softconf has added this to the Global profile. You can set yours by visiting the global profile page. Maybe a few of you can test this, as I did?

Softconf is going to have this dumped in the DB file distributed with proceedings tarballs, so we will have it available for disambiguation purposes.

Next steps:

  1. Advertise this more widely to get people to voluntarily add it
  2. Work with conference organizers (probably for 2022+) to make this mandatory

@github-actions
Copy link

Build successful. You can preview it here: https://preview.aclanthology.org/author-url
This preview will be removed when the branch is merged.

@github-actions
Copy link

Build successful. Some useful links:

This preview will be removed when the branch is merged.

@github-actions
Copy link

Build successful. Some useful links:

This preview will be removed when the branch is merged.

Copy link
Member

@mbollmann mbollmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any updates on this?

If I understand this correctly, the latest proposal was to:

  1. Have /people/{NAME-SLUG}/{IDENTIFIER} as the canonical URL for an author.
  2. For ambiguous names, have /people/{NAME-SLUG} be a disambiguation page linking to the different authors.
  3. For un-ambiguous names, have /people/{NAME-SLUG} be the canonical page. <-- I'm not sure about this one, wouldn't this cause problems if we introduce another author of the same name later?

What about names that we have disambiguated using our current system? A slug like /people/huy-nguyen/huy-nguyen-stanford seems quite verbose.

@@ -90,7 +91,7 @@ def export_anthology(anthology, outdir, clean=False, dryrun=False):
name = anthology.people.get_canonical_name(id_)
log.debug("export_anthology: processing person '{}'".format(repr(name)))
data = name.as_dict()
data["slug"] = id_
data["slug"] = slugify(repr(name)) or "NONE"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a PersonName.slug() function in bin/anthology/people.py, I think all slug generation should be done in that class rather than here in the build script.

@akoehn
Copy link
Member

akoehn commented May 2, 2023

Whoops, shift-enter sends a comment ...

Ideally, the location of a profile would not:

  • be too verbose
  • change when another author is added
  • depend on whether other authors are present
  • depend too much on the name of the author

The ideal for me, therefore, would be /ORCID/author-name, where author-name is optional (and /ORCID/ can forward to the page with the author name). The "only" problem is that we do not have ORCIDs everywhere in our dataset. That way we would e.g. also handle name changes more gracefully, which now negatively impacts a subset of our user base much more than other subsets.

Once we have the identifier first, this would not be as bad anymore:

A slug like /people/huy-nguyen/huy-nguyen-stanford seems quite verbose.

because it would be /people/huy-nguyen-stanford/huy-nguyen/ and the short version (/people/huy-nguyen-stanford/) would also work. It would also mean that the first person with their name could keep it (but I e.g. would have /arne-kohn/arne-kohn/ as my canonical URL...)

Ideally, we would have an ORCID for every author. This will not happen (as we cannot get them for all old entries) but it would be good to push for it going forward.

@mbollmann mbollmann self-assigned this May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants