-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate odd csv -> json-ld behaviour #368
Comments
scope of this is to do a little digging to see if there is something going on that we can/need to fix. Upon investigation, decide whether further escalation to FAIR Data is needed |
The linked ticket starts with an issue with
My guess is that this is leading to a namespace clash on conversion to the JSON-LD. Next step is to investigate if |
This essentially becomes a clone/offshoot of our investigation into title/lower-case near duplictes and actions needed in backlog for #176 |
Note all the
|
We count the number of distinct entries for this from google BigQuery
These are all in first-letter-uppercase suggesting that the lowercase valid values in the data model have not been followed (maybe they were not implemented at the time? The only weird thing is that |
The only occurrence of |
All the other actual values submitted for Note |
Next will look at |
In the CSV: "yes" is a valid value for "Treatment or Therapy" only where as title "Yes" is more frequently used In the data model we see that upper case Yes is used in the valid value within the JSON-LD
|
My hypothesis is that where there are case differences the JSON-LD converter is now harmonising based on the title case version. I wonder if in the past it took both, or harmonized in the lower case version. Action for next sprint. Escalate to FAIR. Suggest @aditigopalan work with them to confirm this hypothesis or understand how cases for the JSON-LD |
Looking back to Aug 2023 data model release I don't see a change in behavior |
This is a problem we will need to engage with FAIR Data on in the future to figure out how to clean this up based on latest expected behavior of schematic. Push this back to baclog and mark for renewal. |
Ticket for us to look into potential causes related to a couple of issues that DCC members have seen where the case of certain valid values (e.g. "Yes" vs "yes) is throwing errors.
Recent example: https://sagebionetworks.jira.com/browse/HTAN-402
Alex also mentioned that he encountered issues with this when recently interacting with the Publications schema.
The text was updated successfully, but these errors were encountered: