Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change URLs to DOIs in <doi> field #1621

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Change URLs to DOIs in <doi> field #1621

wants to merge 2 commits into from

Conversation

mbollmann
Copy link
Member

@mbollmann mbollmann commented Oct 22, 2021

Many PACLIC proceedings have URLs in their <doi> entry in the XML, not DOIs. This fixes that.

Technically, the current entries are Handle URLs, not DOI URLs, but from spot-checking it seems that they are actually valid DOIs (DOI uses Handle internally).

Compare, for example:

(h/t https://twitter.com/gchrupala/status/1451552455519506448)

@mbollmann mbollmann requested a review from a team October 22, 2021 17:25
@github-actions
Copy link

Build successful. You can preview it here: https://preview.aclanthology.org/fix-paclic-dois
This preview will be removed when the branch is merged.

@akoehn
Copy link
Member

akoehn commented Oct 22, 2021

Let's also adjust the schema to catch this kind of mistakes in the future.

Copy link
Member

@akoehn akoehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I made this "request changes" because I think some discussion is needed.

Yes, the DOI service resolve IDs such as 2065/12156. No, this is not a valid DOI. The DOI handbook states:

The DOI prefix shall be composed of a directory indicator followed by a registrant code. These two components shall be separated by a full stop (period).
The directory indicator shall be "10"

In other words: Every DOI needs to start with the character sequence 10..

Someone obviously put in the handle URI for these papers. We could 1) keep them in the doi field because we know they (currently!) resolve even though they are not DOIs 2) find out whether they have proper DOIs and insert them 3) remove the DOI fields and maybe add the handle URI as some other field.

It is too late for me to form a definitve opinion, but I would be hesitant to put a non-DOI identifier into a DOI field. DOIs are specifically made to be exact and we would water that down.

@mbollmann
Copy link
Member Author

Yes, I was about to write the same thing while you were posting this, @akoehn. :)

Funnily enough, even the currently generated nonsense link on the website resolves:

In that case, I'm not sure we currently have a mechanism to handle these cases. I think the DOI field is currently the only way to link to an external website like that.

@mjpost
Copy link
Member

mjpost commented Nov 21, 2021

So are there valid DOIs for these, then?

@akoehn
Copy link
Member

akoehn commented Nov 21, 2021 via email

@mjpost
Copy link
Member

mjpost commented Nov 22, 2021

I think we should

  1. Move the invalid DOIs to a new field, say <handle>
  2. Display them separately

We can split this into two steps, for example doing (1) in this PR and then adding (2) later when someone has time.

Reading the doc raises a separate question: we generate our DOI suffixes as v1/{anth_id}. Why the v1/? I'd suggest we get rid of it, but regenerating the old ones would cost $18,676, and there doesn't seem to be a compelling reason to change it moving forward, apart from aesthetics, which has to be balanced against consistency.

@mjpost
Copy link
Member

mjpost commented Nov 22, 2021

We could also do some something like <doi type="hdl"> for the handle.net instances, using doi here as a generic term, defaulting to the DOI brand.

@mbollmann
Copy link
Member Author

Making a new field would be very little work, it just produces extra code for what currently is a rare exception.

The <doi type="hdl"> way would also work, but I find it quite ironic given that in reality, "DOI" is a subtype of "Handle". :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants