Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Harvesting] Various ODS/DKAN fixes and improvements #7059

Merged
merged 4 commits into from
May 9, 2023

Conversation

tkohr
Copy link
Contributor

@tkohr tkohr commented May 9, 2023

tkohr added 4 commits May 5, 2023 15:45
metadata:
creation = considered "modified" (md is created when ressource is loaded)
revision =  "metadata processed"
publication = "modified"

ressource:
creation = not avavailable in ods
revision = "data processed"
publication = "modified"
this accepts - and / as date separators and just whitespace chars between date and time
@tkohr tkohr changed the title [Harvesting] Various ODS/DKAN fixes [Harvesting] Various ODS/DKAN fixes and improvements May 9, 2023
@jahow
Copy link
Contributor

jahow commented May 9, 2023

Looks like the test failure is unrelated:

Error: 6,250 [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.481 s <<< FAILURE! - in org.fao.geonet.api.links.LinksApiTest
Error: 6,252 [ERROR] org.fao.geonet.api.links.LinksApiTest.getLinks  Time elapsed: 8.999 s  <<< FAILURE!
java.lang.AssertionError: Status expected:<201> but was:<400>
	at org.fao.geonet.api.links.LinksApiTest.getLinks(LinksApiTest.java:115)

@tkohr
Copy link
Contributor Author

tkohr commented May 9, 2023

Thanks, @jahow !

@jahow jahow merged commit b0952d0 into geonetwork:main May 9, 2023
<mri:topicCategory>
<mri:MD_TopicCategoryCode></mri:MD_TopicCategoryCode>
</mri:topicCategory>
<!-- ODS themes copied as topicCategory -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if metas/theme correspond to what ODS calls "dataset theme" ? But if you read the ODS documentation about "dataset themes" (see https://userguide.opendatasoft.com/l/en/article/7ng5ysoinv-managing-the-dataset-themes), a theme is a list of free text terms with 12 default values provided that any ODS instances can customize.

So it does not sounds appropriate to map this to an ISO topic category which is an enumeration (see https://github.com/geonetwork/core-geonetwork/blob/main/schemas/iso19115-3.2018/src/main/plugin/iso19115-3.2018/schema/standards.iso.org/19115/-3/mri/1.0/identification.xsd#L356-L463) (ie. a list of values that you cannot extend unless you create a profile).

Putting free text values into an enumeration will brings a couple of issues:

  • All records produced by that harvester will be invalid

Pasted image 1

  • Facets will be a mix of keys and free text values - possibly in a mix of languages ... even those with same meaning will be messed up eg. Santé = Health ?

Pasted image

  • Topic category icons will not be available for the free text values

If you really want to put ODS dataset theme in ISO topic category, you should provide a mapping and set the proper ISO keys. Therefor as it is more something like category or thesaurus in GeoNetwork, I would rather revert the change and keep ODS dataset themes in a dedicated keyword block so that user can easily create facet or search on this ODS vocabulary ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale was that records harvested from ODS would be filterable by theme just like any other one. The record will obviously be invalid but the search experience is better. Since themes in ODS are pretty much free form it's probably impossible to map them to an ISO topic.

Feel free to revert this change if you think it's inappropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply map the default ODS themes to valid ISO topics eg. metas/theme=Santé then <mri:MD_TopicCategoryCode>health and keep custom and unknown ODS themes in a dedicated keyword group. This would allow to conform to ISO and not mess up the GeoNetwork default filter on ISO topics.

If building new UI with a concept of theme which is not corresponding to ISO topics, then the ODS keyword group can be used and/or combined with ISO topics for filtering.

Other alternatives:

  • Other harvesters not providing such information have extra parameters to set language or topic category eg. OGC WxS

image

  • Batch editing can also be used to extend harvested records with local rules (eg. to add keywords from an internal classification system).

<mdb:dateInfo>
<cit:CI_Date>
<cit:date>
<gco:DateTime><xsl:value-of select="metas/metadata_processed"/></gco:DateTime>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this XSLT conversion is providing mapping for V1 and V2 of ODS API, it would be kind to handle both mapping or at least to not create empty elements when the mapping is only improved for 1 version of the ODS API.

@jahow
Copy link
Contributor

jahow commented May 25, 2023

Looks like we underestimated the impact of putting unrecognized values in the topicCategory field, I personally haven't anticipated this to be such a problem. Until we have the budget to implement a proper mapping to ISO values I feel like the best approach would be to revert this PR, @fxprunayre let me know if I can move forward.

@fxprunayre
Copy link
Member

I don't think that many people are using it for now ? (I know one but it is for now only for testing purpose), so we can probably keep it in if it does not disturb your users ...

@jahow
Copy link
Contributor

jahow commented May 25, 2023

I just checked and a record produced by this harvester will have other validation issues apart from the topic:
image

(most of them quite easy to fix most likely)

I guess it would make sense to address these more globally. Also we never took special care to produce valid ISO results for other open data harvesters AFAIK.

@fxprunayre
Copy link
Member

The point is not really to produce a valid record which will always depend on the input of the harvester. Here the point is that topic category is an enumeration and probably the ODS theme will never match any value of the enumeration - so will always produce an error on this element (and will mix free text with codelist, no icons, ...).

fxprunayre added a commit that referenced this pull request Jul 5, 2023
Follow up of #7059

* Add elements taking into account the API version 1 or 2
* Do not put free text in an ISO field which is an enumeration (which avoids to mix facet icons and translations for topic category) 
  *  Provide a mapping based on default ODS values for french and english 
  * Add a dedicated keyword block with the free text values
fxprunayre added a commit that referenced this pull request Jul 6, 2023
Follow up of #7059

* Add elements taking into account the API version 1 or 2
* Do not put free text in an ISO field which is an enumeration (which avoids to mix facet icons and translations for topic category)
  *  Provide a mapping based on default ODS values for french and english
  * Add a dedicated keyword block with the free text values
fxprunayre added a commit to SPW-DIG/metawal-core-geonetwork that referenced this pull request Sep 9, 2024
Follow up of geonetwork#7059

* Add elements taking into account the API version 1 or 2
* Do not put free text in an ISO field which is an enumeration (which avoids to mix facet icons and translations for topic category)
  *  Provide a mapping based on default ODS values for french and english
  * Add a dedicated keyword block with the free text values
fxprunayre added a commit that referenced this pull request Sep 9, 2024
Follow up of #7059

* Add elements taking into account the API version 1 or 2
* Do not put free text in an ISO field which is an enumeration (which avoids to mix facet icons and translations for topic category)
  *  Provide a mapping based on default ODS values for french and english
  * Add a dedicated keyword block with the free text values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants