-
-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON Harvester / Add XSL transformation for harvesting DKAN catalogs #6240
Conversation
Thanks @jahow I noticed some weird behavior though. {
"id": "379ac99b-9864-4269-8f1b-5ab6a4a198d0",
"revision_id": "",
"url": "\u003Cdiv class=\u0022field field-name-field-link-remote-file field-type-file field-label-hidden\u0022\u003E\u003Cdiv class=\u0022field-items\u0022\u003E\u003Cdiv class=\u0022field-item even\u0022\u003Ehttps:\/\/sig.hautsdefrance.fr\/ext\/opendata\/Sraddet2020\/cer_reservoir_s_fr32.csv\u003C\/div\u003E\u003C\/div\u003E\u003C\/div\u003E",
"description": "\u003Cp\u003EDonn\u00e9es brutes au format Csv (r\u00e9servoirs de la biodiversit\u00e9- Trame verte)\u003C\/p\u003E\n",
"format": "csv",
"state": "Active",
"revision_timestamp": "lun, 06\/12\/2021 - 03:00",
"name": "Tableau de donn\u00e9es (r\u00e9servoirs de la biodiversit\u00e9)",
"mimetype": "csv",
"size": "",
"created": "jeu, 03\/06\/2021 - 03:00",
"resource_group_id": "b72cd25d-1cec-49f6-8c71-297bd373fa01",
"last_modified": "Date changed lun, 06\/12\/2021 - 03:00"
}, which ends up in the metadata XML as <cit:linkage>
<gco:CharacterString xmlns:gco="http://standards.iso.org/iso/19115/-3/gco/1.0">
<div class="field field-name-field-link-remote-file field-type-file field-label-hidden">
<div class="field-items">
<div class="field-item even">
https://sig.hautsdefrance.fr/ext/opendata/Sraddet2020/cer_reservoir_s_fr32.csv
</div>
</div>
</div>
</gco:CharacterString>
</cit:linkage> Is there any XSL Utils that could help to remove all HTML tags from a text element ? Thanks |
I've added a commit to remove the HTML tags from the urls with a regex. It works quite well in the datahub: (you can see that the data preview is functional now) There has been work recently by @fxprunayre to handle HTML content in metadata records but I think it was more intended to convert HTML to markdown, not strip HTML tags completely. |
|
Updated with the |
My bad thanks @fxprunayre. I was very surprise to find XSLUtil almost empty but i confused with the on in the schema 19115-3, didn't see there was 2 differents :/ |
@jahow did you check metadata 7698d9ab-3e4f-497c-9332-87413deb24f2 I remember that there was also weird char in the keywords (the encoding of the |
No I haven't handled that yet |
@fgravin with the latest commit this is good to go: |
Yes looks good thanks @jahow |
@jahow Hello this feature to harwest with DKAN-to-ISO19115-3-2018 disappear in Geonetwork 4.4.1 ? I see there is a new field "XSL transformation to apply" with value "schema:iso19115-3.2018:convert/fromJsonDkan" but doesn't look to works. this feature disappear in 4.4.1 ? |
Can be tested using the following parameters:
/result/0
id
DKAN-to-ISO19115-3-2018
And leave other parameters empty.