Support for ISO-639-3 language codes #7690

landreev · 2021-03-16T14:25:49Z

What this PR does / why we need it:

This will allow us to import metadata where the "language" field is populated not by a literal value ("French", "English") but by 3 letter ISO-639-3 codes ("fra", "eng").
This problem was encountered by a remote installation when harvesting Dublin Core records from Zenodo. DC documentation suggests that using these codes to specify the language is an acceptable practice.

In this PR these codes are added as "alternate values" for the corresponding controlled vocabulary entries (more info in the issue). Once a record with a field like this (for example, <dc:language>fra</dc:language>) is imported, it becomes a controlled vocabulary entry French in our metadata.

Which issue(s) this PR closes:

Closes #7638

Special notes for your reviewer:

Note that our existing metadata block update API was not updating these "alternate values" found in the TSV. Those were only populated on the initial import. So I had to change that.

Suggestions on how to test this:

Using the example from the linked issue:
Once the branch is built and deployed,
Update the citation block:

wget https://raw.githubusercontent.com/IQSS/dataverse/7638-iso-639-3-language-codes/scripts/api/data/metadatablocks/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

Create a harvest from the original issue:

harvestUrl: https://www.zenodo.org/oai2d
metadataFormat: oai_dc
set: user-couperin

(This OAI server offers hundreds of sets. This PR has an extra improvement, unrelated to languages - the sets will appear sorted in the pull down menu, making it easier to use)

The harvest should be able to import all 21 records in the set. Including the 16 of them that have these language codes in the metadata, that were failing previously.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

…titute values in the citation metadata block, where an exact match was available (140 languages total; #7638)

…mote servers offering long lists of OAI sets. (#7638)

…ate values. (#7638)

sekmiller

Looks good

kcondon · 2021-03-22T14:47:02Z

Followed the instructions: deployed, applied citation.tsv, created harvest client and it does harvest 99 datasets (there was no subset available) but it failed with an unknown error and nothing in logs. I then attempted a basic harvest against Harvard and it failed outright, no indication of why, just a get request failure. I had been using oai_dc and generic oai client for initial test but switched to dataverse_4+ against Harvard once generic failed and no luck.

landreev · 2021-03-22T14:54:33Z

It sounds like there's something wrong with this system, that's unrelated to this PR; if it can't harvest anything. You definitely want to harvest this particular set; if it's not seeing any sets, it means something is wrong. Let's take a closer look.

kcondon · 2021-03-22T14:55:45Z

as discussed, I can try testing against develop branch. harvesting worked partially against test case.
Ok, was a few things that made it confusing:

zenodo has a ton of sets so it appears to take a while to populate set list. leonid waited for sets and selected right one. it worked
zenodo has a lot of studies outside of specified set that likely contains data we don't like, hence failure
I have a browser autocomplete that types http rather than https when I type harvard oai server, that can contact server but not complete transactions. changing to https works.

Thanks for the assistance @landreev

landreev added 5 commits March 11, 2021 16:10

Adds the ISO 639-3 language letter codes to the list of accepted subs…

2e999c3

…titute values in the citation metadata block, where an exact match was available (140 languages total; #7638)

a release note for the updated citation metadata block. (#7638)

a0b0e8d

a quick fix that makes the harvesting clients page more useable w/ re…

63bb986

…mote servers offering long lists of OAI sets. (#7638)

Changes to make dataset block imports update controlled vocab. altern…

284cc6e

…ate values. (#7638)

Merge branch 'develop' into 7638-iso-639-3-language-codes

7d2a76b

landreev assigned landreev and unassigned landreev Mar 16, 2021

removing an import that's not needed (#7638)

010962a

sekmiller self-assigned this Mar 17, 2021

sekmiller approved these changes Mar 17, 2021

View reviewed changes

sekmiller removed their assignment Mar 17, 2021

kcondon merged commit 7ba0fd7 into develop Mar 22, 2021

kcondon self-assigned this Mar 22, 2021

kcondon deleted the 7638-iso-639-3-language-codes branch March 22, 2021 15:55

djbrooke added this to the 5.4 milestone Mar 22, 2021

jeromeroucou mentioned this pull request Nov 16, 2021

Feature Request/Idea: Sanitize languages controlled vocabulary values #8243

Closed

pdurbin mentioned this pull request Mar 9, 2022

Can't harvest when Dublin core field language is set #8139

Closed

pdurbin added the Feature: Harvesting label Apr 13, 2022

pdurbin mentioned this pull request Apr 13, 2022

Spike: Inventory and prioritize all existing Harvesting related issues IQSS/dataverse-pm#24

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for ISO-639-3 language codes #7690

Support for ISO-639-3 language codes #7690

landreev commented Mar 16, 2021 •

edited

Loading

sekmiller left a comment

kcondon commented Mar 22, 2021

landreev commented Mar 22, 2021

kcondon commented Mar 22, 2021 •

edited

Loading

Support for ISO-639-3 language codes #7690

Support for ISO-639-3 language codes #7690

Conversation

landreev commented Mar 16, 2021 • edited Loading

sekmiller left a comment

Choose a reason for hiding this comment

kcondon commented Mar 22, 2021

landreev commented Mar 22, 2021

kcondon commented Mar 22, 2021 • edited Loading

landreev commented Mar 16, 2021 •

edited

Loading

kcondon commented Mar 22, 2021 •

edited

Loading