Scheming support #281

amercader · 2024-05-22T09:43:09Z

This PR adds initial support for seamless integration between ckanext-dcat and ckanext-scheming, providing a custom profile that modifies the dataset dicts generated and consumed from the existing profiles so it plays well with the scheming presets defined.

Summary of changes

A scheming schema definition with the different DCAT properties defined (ckanext/dcat/schemas/dcat_ap_2.1.yaml)
Helper functions in the RDFProfile class to access schema field definitions from datasets and resources
A new euro_dcat_ap_scheming profile that adds support for the field serializations supported by the ckanext-scheming presets. The existing profiles (euro_dcat_ap and euro_dcat_ap_2) remain unchanged (except for some very minor backward compatible changes regarding the handling of access services in distributions/resources). This means that existing sites will keep working as currently, but maintainers can choose to enable scheming support if they choose to migrate to that approach. Upcoming DCAT 3 based profiles will be scheming based (in a new ckanext-dcat version)
Includes "end to end" tests for DCAT -> CKAN and CKAN -> DCAT using the new schema to ensure it works as expected

Compatibility and release plan

Extra care has been taken to not break any existing systems. Sites using the existing euro_dcat_ap and euro_dcat_ap_2 profiles should not see any change in their current parsing and serialization functionalities and these profiles will never change their outputs. Sites willing to migrate to a scheming based profile can do so by adding the new euro_dcat_ap_scheming profile at the end of their profile chain (value of ckanext.dcat.rdf.profiles config option, eg ckanext.dcat.rdf.profiles = euro_dcat_ap_2 euro_dcat_ap_scheming), which will modify the existing profile outputs to the expected format by the scheming validators. Note that the scheming profile will only affect fields defined in the schema definition file, so sites can start migrating gradually different metadata fields.

This compatibility profile will be released in the next ckanext-dcat version (1.8.0). The upcoming DCAT v3 based profiles for DCAT-AP 3 and DCAT-US 3 will be scheming based and will incorporate the mapping changes described below.

Mapping changes

The main changes between the old processors (parsers and serializers) and the new scheming-based ones are:

Root level fields

Custom DCAT fields that didn't link directly to standard CKAN fields were stored as extras (see all the ones marked extra: here). So the DCAT version_notes field would be stored as:

{
    "name": "test_dataset_dcat",
    "title": "Test dataset DCAT",
    "extras": [
         {"key": "version_notes", "value": "Some version notes"}
    ]
}

In the scheming-based profile, if the field is defined in the scheming schema, it will get stored as a root level field, like all custom dataset properties:

{
    "name": "test_dataset_dcat",
    "title": "Test dataset DCAT",
    "version_notes": "Some version notes"
}

List fields

The old profiles stored lists as JSON strings:

{
    "name": "test_dataset_dcat",
    "title": "Test dataset DCAT",
    "extras": [
         {"key": "conforms_to", "value":"[\"Standard 1\", \"Standard 2\"]"}
    ],
    "resources": [
        {
             "name": "Some resource",
             "documentation": "[\"http://dataset.info.org/distribution1/doc1\", \"http://dataset.info.org/distribution1/doc2\"]"
        }
    ]
}

By using the multiple_text preset, lists are now automatically handled:

{
    "name": "test_dataset_dcat",
    "title": "Test dataset DCAT",
    "conforms_to": [
         "Standard 1", 
         "Standard 2"
    ],
    "resources": [
        {
             "name": "Some resource",
             "documentation": [
                 "http://dataset.info.org/distribution1/doc1", 
                 "http://dataset.info.org/distribution1/doc2"
             ]
        }
    ]
}

The form snippets UI allows to provide multiple values:

Repeating subfields

Mapping complex entities like dcat:contactPoint or dct:publisher was very limited, storing a subset of properties of just one linked entity as prefixed extras:

{
    "name": "test_dataset_dcat",
    "title": "Test dataset DCAT",
    "extras": [
        {"key":"contact_name","value":"PointofContact"},
        {"key":"contact_email","value":"[email protected]"}
    ],
}

By using the repeating_subfields preset we can consume and present these as proper objects, and store multiple entities for those properties that have 0..n cardinality (see comment in "Issues"):

{
    "name": "test_dataset_dcat",
    "title": "Test dataset DCAT",
    "contact": [
        {
            "name": "Point of Contact 1",
            "email": "[email protected]"
        },
        {
            "name": "Point of Contact 2",
            "email": "[email protected]"
        },
    ]
}

Repeating subfields are also supported in resources/distributions. In this case complex objects like dcat:accessService were stored as JSON strings:

{
    "name": "test_dataset_dcat",
    "title": "Test dataset DCAT",
    "resources": [
        {
             "name": "Some resource",
             "access_services": "[{\"availability\": \"http://publications.europa.eu/resource/authority/planned-availability/AVAILABLE\", \"title\": \"Sparql-end Point\", \"endpoint_description\": \"SPARQL url description\", \"license\": \"http://publications.europa.eu/resource/authority/licence/COM_REUSE\", \"access_rights\": \"http://publications.europa.eu/resource/authority/access-right/PUBLIC\", \"description\": \"This SPARQL end point allow to directly query the EU Whoiswho content (organization / membership / person)\", \"endpoint_url\": [\"http://publications.europa.eu/webapi/rdf/sparql\"], \"uri\": \"\", \"access_service_ref\": \"N2ff5798aac56447e89438cc838512d26\"}]"
        }
    ]
}

They now appear as proper objects:

{
    "name": "test_dataset_dcat",
    "title": "Test dataset DCAT",
    "resources": [
        {
             "name": "Some resource",
             "access_services": [                                                                                                                                                                                                                                                 
                    {                                                                                                                                                                                                                                                                
                        "availability": "http://publications.europa.eu/resource/authority/planned-availability/AVAILABLE",                                                                                                                                                           
                        "title": "Sparql-end Point",                                                                                                                                                                                                                                 
                        "endpoint_description": "SPARQL url description",                                                                                                                                                                                                            
                        "license": "http://publications.europa.eu/resource/authority/licence/COM_REUSE",                                                                                                                                                                             
                        "access_rights": "http://publications.europa.eu/resource/authority/access-right/PUBLIC",                                                                                                                                                                     
                        "description": "This SPARQL end point allow to directly query the EU Whoiswho content (organization / membership / person)",                                                                                                                                 
                        "endpoint_url": [                                                                                                                                                                                                                                            
                            "http://publications.europa.eu/webapi/rdf/sparql"                                                                                                                                                                                                        
                        ],                                                                                                                                                                                                                                                           
                        "uri": "",                                                                                                                                                                                                                                                   
                    }                                                                                                                                                                                                                                                                
                ]
        }
    ]
}

Again, these can be easily managed via the UI thanks to the scheming form snippets:

Issues

For complex objects like dct:publisher that have 0..1 cardinality, I don't think CKAN supports "non-repeating" subfields so it makes sense to use the repeating_subfields one for now and create a new one in the future.
Scheming has presets for date and datetime with nice UI form snippets so it's tempting to use them for properties like issued and modified, but these support other formats like xsd:gYear or xsd:gYearMonth which will fail with these presets so we can consider creating a new one that extends the existing ones to support these formats

This allows to check if a field should be stored as a custom field or an extra

…ext-dcat into 56-add-schema-file-dcat-ap-2.1

EricSoroos · 2024-05-22T11:17:20Z

ckanext/dcat/profiles.py

@@ -2003,3 +2093,122 @@ def _distribution_url_graph(self, distribution, resource_dict):
    def _distribution_numbers_graph(self, distribution, resource_dict):
        if resource_dict.get('size'):
            self.g.add((distribution, SCHEMA.contentSize, Literal(resource_dict['size'])))
+
+
+# TODO: split all these classes in different files


Done in #282

EricSoroos · 2024-05-22T11:40:37Z

@amercader I've been working in DCAT this week, including adding spec compliant HVD 2.2.0 output and scheming portions to the current 1.7 version. (somewhat split across dcat and our schema extension at the moment).

A couple of things have come up for making the output compliant with the HVD shaql files (https://semiceu.github.io/DCAT-AP/releases/2.2.0-hvd/#validation):

There are some items that need to be typed, e.g. licenses. This is a first cut, and I want to refactor this into the add_triples... methods: https://github.com/derilinx/ckanext-dcat/blob/dcat-hvd-2.2.0/ckanext/dcat/profiles.py#L914

 def _add_with_class(self, dataset_dict, dataset_ref, key, predicate, _type, _class, list_value=False):
        value = self._get_dataset_value(dataset_dict, key)

        def _add(v):
            ref = _type(v)
            self.g.add((ref, RDF.type, _class))
            self.g.add((dataset_ref, predicate, ref))

        if value:
            if list_value:
                for v in self._read_list_value(value):
                    _add(v)
            else:
                _add(value)
...
            self._add_with_class(resource_dict, distribution, 'license', DCT.license, URIRefOrLiteral, DCT.LicenseDocument)

gives us something like this:

...
<http://www.opendefinition.org/licenses/cc-by> a dct:LicenseDocument .

<http://data.europa.eu/eli/reg_impl/2023/138/oj> a <http://data.europa.eu/eli/ontology#LegalResource> .

<https://test.staging.derilinx.com/dataset/242e33cf-a097-4f59-94f3-25fcddeffaec/resource/52dcb446-d1f1-40d2-a515-bd708a57b9c6> a dcat:Distribution ;
    dcatap:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj> ;
    dct:format "HTML" ;
    dct:issued "2024-05-20T16:12:07"^^xsd:dateTime ;
    dct:license <http://www.opendefinition.org/licenses/cc-by> ;
    dct:modified "2024-05-20T17:00:51"^^xsd:dateTime ;
    dct:title "Test" ;
    dcat:accessURL <https://test.staging.derilinx.com/> .

Codelists are important, e.g., the HVD Category needs to be from this list: https://op.europa.eu/en/web/eu-vocabularies/dataset/-/resource?uri=http://publications.europa.eu/resource/dataset/high-value-dataset-category (which when it's not being slammed, has an RDF file with a skos:Concept and entries, each with a prefLabel from each official EU language.) (dl is here: https://op.europa.eu/o/opportal-service/euvoc-download-handler?cellarURI=http%3A%2F%2Fpublications.europa.eu%2Fresource%2Fcellar%2F29a21fd5-5c6f-11ee-9220-01aa75ed71a1.0001.02%2FDOC_1&fileName=high-value-dataset-category.rdf)

The codelists get rendered like this in the .ttl:

<https://test.staging.derilinx.com/dataset/242e33cf-a097-4f59-94f3-25fcddeffaec> a dcat:Dataset ;
    dcatap:applicableLegislation <http://data.europa.eu/eli/dir/2007/2/2019-06-26>,
        <http://data.europa.eu/eli/reg_impl/2023/138/oj> ;
    dcatap:hvdCategory <http://data.europa.eu/bna/c_dd313021> ;
    dct:identifier "242e33cf-a097-4f59-94f3-25fcddeffaec" ;
    dct:issued "2024-05-20T16:11:40"^^xsd:dateTime ;
    dct:language "en" ;
    dct:modified "2024-05-20T17:00:51"^^xsd:dateTime ;
    dct:publisher <https://test.staging.derilinx.com/organization/b30a8777-1478-43e1-8dcb-9beded4f5052> ;
    dct:title "Test Dataset" ;
    dcat:distribution <https://test.staging.derilinx.com/dataset/242e33cf-a097-4f59-94f3-25fcddeffaec/resource/52dcb446-d1f1-40d2-a515-bd708a57b9c6> .

<http://data.europa.eu/bna/c_dd313021> a skos:Concept ;
    skos:inScheme <http://data.europa.eu/bna/asd487ae75> .

amercader · 2024-05-22T13:26:31Z

@EricSoroos thanks for the feedback:

HVD 2.20 support sounds amazing. Is this developed in a separate profile built on top of euro_dcat_ap_2? If so would be great to have it upstream. I see it as not directly related to this PR, once support for DCAT-AP 2.1 is ready we can create a separate profile and schema for HDV 2.2.0
More generally regarding SHACL validation I've been thinking about integrating it as part of the test suite or even as a command that site maintainers can run as a way to "certify" support for the different DCAT specs
Types: absolutely. I literally was thinking about this today with relation to the different types that dates can have according to the spec (see "Issues" in the description), but if it's a requirement of the SHAQL validation even more so. I like the approach of extending the _add_triples_... utility functions
Codelists: yes, that's definitely on the list for the DCAT v3 profiles, as there are controlled vocabularies used but of course it also makes sense if needed for HDV. We could explore using choices or more likely choices_helper presets for required or recommended fields, with CLI commands to import them into the main or datastore database, plus a form snippet that shows the options (or autocompletes them if there are a lot of them)

Great to see you are working on this same area. If you have any feedback on the general approach followed for scheming support it would be great to hear it.

amercader · 2024-05-22T13:28:48Z

@seitenbau-govdata @bellisk would love to know your take on this, and see if this approach would play well with how you are using ckanext-dcat

EricSoroos · 2024-05-22T14:00:06Z

In terms of HVD support, the current EU DCAT 2 implementation is close, at least, it has all of the fields. This commit: derilinx/ckanext-dcat@d5ef9f4 is the difference, and it's only the codelist and two types that were required. There are some other compliance issues, like one of the license or rights needs to be available, and the applicable_legislation has to have at least one specific value. I'm looking at validation level stuff for those (legislation already done, license/rights not). I'm at the point of thinking that these things are more general, so _add_hvd_category should be _add_from_codelist.

I'm not clear that we'd necessarily want to be adding a separate profile for this -- Inheritance is really tricky when you're blatting in items to a graph, and may need to override just one piece of it. From what I can tell, the extra profiles tend to be aggregative and compatible, so realistically, there are potentially a few extra fields per entity and/or additional codes/required fields. Also, I think that the changes here are more of the form of "potentially backwards incompatible fixing the implementation" rather than actually adding support for the profile.

FWIW, I think this has been the general take previously, e.g, the geo fields are added from GeoDCAT.

For the Codelists, (at least on the scheming side) I've got something like this in my schema:

    {
      "field_name": "hvd_category",
      "grouping": "High Value Datasets",
      "label": "High Value Dataset Category",
      "form_snippet": "select.html",
      "validators": "ignore_missing",
      "choices_helper": "dlxschema_codelist_choices",
      "codelist": "high-value-dataset-category",
      "help_text": {
        "en": "EU Category for HVD."
      }
    },

And then the choices_helper is this:

@lru_cache(maxsize=None)
def _load_codelist(choices_path):
    """ Cache the json load, so that we're only actually reading once per invocation """
    return json.loads(choices_path.read_text())

def codelist_choices(field):
    """ Get the choices corresponding to the code list from the codelists directory                                                                                                
                                                                                                                                                                                   
    :param name: string, name of the codelist, not including the extension                                                                                                         
    :returns: list of scheming choices                                                                                                                                             
    """

    name = field.get('codelist', None)
    if not name:
        return []
    choices_path = Path(__file__).parent / 'codelists' / (name + ".json")
    if not choices_path.exists:
        return []

    choices = _load_codelist(choices_path)
    return choices

The codelist directory has the .rdf and a .json converted from it, with the languages I'm interested in (though realistically, it wouldn't hurt to put all the eu languages in)

[
  {
    "label": {
      "en": "Geospatial",
      "ga": "Geosp\u00e1s\u00fail",
    },
    "value": "http://data.europa.eu/bna/c_ac64a52d"
  },
  {
    "label": {
      "en": "Earth observation and environment",
      "ga": "Faire na cruinne agus an comhshaol",
    },
    "value": "http://data.europa.eu/bna/c_dd313021"
  },
...

Right now, this is spread over my schema plugin and the dcat plugin, but the next iteration is going to need to pull the codelists into dcat so that I can kick out the prefLabels there.

wardi · 2024-05-22T19:37:46Z

ckanext/dcat/plugins/__init__.py

+                    for index, item in enumerate(dataset_dict[field['field_name']]):
+                        for key in item:
+                            # Index a flattened version
+                            new_key = f'{field["field_name"]}_{index}_{key}'


IIUC this would let us ask solr for things like 'datasets with the 6th name field in contacts equal to "frank"', but not 'datasets containing any contact named "frank"'. If we the same keys from all subfields into a single field like

new_key = f'{field["field_name"]}__{key}'

then we could, right? We're not doing dynamic solr schemas so we'd have to combine everything as text fields but it should work for text searches.

Also nothing here is specific to dcat so shouldn't it be a scheming PR?

Right, given this:

With your suggestion you need to run a query like q=contact__name:*Racoon* to get a hit, with the original it would be impossible to get this hit without knowing the index, so that's obviously better.
But I think users would expect to find these results in a free text search as well. For this, the subfields need to be indexed in an extras_* field, as these are copied to the catch-all text field. So this would allow just to do q=Racoon and get a result back, so maybe indexing

new_key = f'extras_{field["field_name"]}__{key}'

with the combined key values is the better approach.

Also nothing here is specific to dcat so shouldn't it be a scheming PR?

Sure, I was just getting things working in this extension. Do you want to replace the logic in SchemingNerfIndexPlugin with this one or create a separate plugin? I'll send a PR

ckanext/dcat/processors.py

wardi · 2024-05-22T19:58:54Z

ckanext/dcat/schemas/dcat_ap_2.1.yaml

+
+    - field_name: endpoint_url
+      label: Endpoint URL
+      preset: multiple_text


WDYT about tighter validation on this schema, e.g. BCP47 values in language, valid emails, URIs and URLs?

I think it's a great idea, but I want to do it in a second stage once all the fields are defined in the schema and the general approach validated. I'll start to compile a list of possible validations, these are all great candidates

EricSoroos · 2024-05-23T12:40:53Z

A little more thinking on the relationship between DCAT-AP base and the extension profiles (e.g. HVD, Geo, etc) .

I think that it would definitely make sense to have the individual profiles have either pluggable schema sections or diffs/inheritance against the core schema. E.g., Site A needs HVD, Site B needs HVD + Geo. We're using our schema_field_groups for this, so there's an HVD tab in the dataset view.

At the graph generation level, I don't know if there's a clean way to do this in an inherited manner. Right now, the EUDCAT2 is a combination of v1 + base + HVD + Geo. There's no issue adding the additional profiles here if the data doesn't support it.

Maybe a better way to do this would be composition rather than inheritance. E.g., have the profile configure a set of [ckan object]_to_graph methods, and those additional profiles would only be responsible for those items that aren't part of the base. As it is, it feels like the profile inheritance is quite chunky for adding a few fields.

Mostly taken from the DCAT-AP 2.1 spec doc, adapted for CKAN

amercader · 2024-06-11T09:40:21Z

I think this is now ready to go, any further work should be done in separate PRs as this has grown quite a lot.

Highlights are:

Updated docs (I think it's time to set up proper RTD docs as the README is massive)
Schema definitions (all files and only recommended ones)
Custom validators
Custom indexing logic
The compatibility profile that tweaks the outputs of the existing profiles to support scheming definitions

ckanext/dcat/schemas/dcat_ap_2.1_full.yaml

ckanext/dcat/schemas/dcat_ap_2.1_recommended.yaml

wardi · 2024-06-11T19:44:50Z

This looks good.

I would be tempted to put more of the logic in the schemas but this extension needs to maintain backwards compatibility and ckanext-scheming-less operation so your approach makes sense.

Co-authored-by: Ian Ward <[email protected]>

As this is a `text` field that allows free text search

Scheming adds a dict with empty keys when empty repeating subfields are submitted from the form. Check that there's an actual value before creating the triples when serializing

EricSoroos · 2024-06-24T13:47:16Z

ckanext/dcat/validators.py

+    except Invalid:
+        raise Invalid(
+            _(
+                "Date format incorrect. Supported formats are YYYY, YYYY-MM, YYYY-MM-DD and YYYY-MM-DDTHH:MM:SS"


This is overly restrictive. The first datetime I attempted to parse with the dcat harvester was of the form: YYYY-MM-DDTHH:MM:SS.000Z, which is permitted by ISO8601 (https://en.wikipedia.org/wiki/ISO_8601) and the xsd:dateTime (https://www.w3.org/TR/xmlschema11-2/#dateTime) spec.

It appears that this goes back to the ckan helper date_str_to_datetime which "Converts an ISO like date to a datetime", using a 12 yr old regex based time zone ignoring date parser. This predates python having reasonable sensible timezone aware datetimes and the datetime.datetime.fromisoformat function.

The core helper probably should be fixed, with its potential for knock on changes, but in the meantime since this is new code, perhaps we should just directly use datetime.datetime.fromisoformat here. (python3.7 min for that). We'd still need to support YYYY and YYYY-MM specially in the code, because those aren't covered.

There's also a version of this in scheming (https://github.com/ckan/ckanext-scheming/blob/master/ckanext/scheming/helpers.py#L299) which also does manual date parsing, but again looks replaceable with fromisoformat.

datetime.datetime.fromisoformat() is quite limited up until Python 3.11:

Changed in version 3.11: Previously, this method only supported formats that could be emitted by date.isoformat() or datetime.isoformat().

So on python 3.10 and lower these dates are not parsed:

Python 3.10.10 (main, Mar 14 2023, 15:55:23) [GCC 9.4.0] Type 'copyright', 'credits' or 'license' for more information IPython 8.13.1 -- An enhanced Interactive Python. Type '?' for help. In [1]: import datetime In [2]: datetime.datetime.fromisoformat("2011-11-04T00:05:23Z") --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[2], line 1 ----> 1 datetime.datetime.fromisoformat("2011-11-04T00:05:23Z") ValueError: Invalid isoformat string: '2011-11-04T00:05:23Z'

What do you think of the approach I followed in c7b8c02? If something is not an xsd:gYear, an xsd:gYearMonth or an xsd:date we just let dateutil parse it (which will accept the timezone values you suggested). If that parses we serve it as an xsd:dateTime.
The tests check that 1) the input value is unchanged and 2) it's served with the correct xsd data type.

Looks better than doing it ourselves. You'r enot testing for invallid dates, or ... conditionally valid dates like M/D/Y, but overall, I'm definitely happier with getting a builtin date parser that handles the in the wild formats that I've seen.

Added tests in 39b4d91

amercader · 2024-07-05T10:09:18Z

Merging this big chunk of changes, any followup like #288 we can do in smaller PRs

amercader added 18 commits May 8, 2024 22:18

[#56] Allow to provide a dataset schema to profiles

65abb1f

This allows to check if a field should be stored as a custom field or an extra

[#56] Handle list values

9faf5f5

[#56] Handle repeating subfields

a808f72

[#56] Add draft schema

d0b219e

[#56] Add some examples

7ee354a

[#56] Fix repeating subfields index logic

9b847e9

[#56] [#56] Initial e2e scheming support test

e6583aa

[#56] Serialize repeating subfields

d86f467

[#56] Add sample of resource fields

000baa4

[#56] [#56] Serialize repeating subfields

2d8d969

[#56] [#56] Add sample of resource fields

c5865fb

[#56] Use profiles from config in CLI

a77d5c2

[#56] Separate scheming compat profile, parsing

35657ef

Merge branch '56-add-schema-file-dcat-ap-2.1' of github.com:ckan/ckan…

62a7962

…ext-dcat into 56-add-schema-file-dcat-ap-2.1

[#56] e2e test DCAT -> CKAN

e0f15f5

[#56] Scheming compatibility profile, serialization

0b6a8dd

[#56] Install scheming in github actions

20ac269

[#56] Add CKAN<2.10 before index hook variant

5375232

amercader changed the title ~~56 add schema file dcat ap 2.1~~ Scheming support May 22, 2024

amercader mentioned this pull request May 22, 2024

Add schemas for ckanext-scheming for DCAT-AP 2.1 #56

Closed

EricSoroos reviewed May 22, 2024

View reviewed changes

wardi reviewed May 22, 2024

View reviewed changes

ckanext/dcat/processors.py Outdated Show resolved Hide resolved

wardi reviewed May 22, 2024

View reviewed changes

[#56] dataset_schema -> dataset_type

9b0abce

amercader added 6 commits June 6, 2024 14:58

[#56] Help texts for all fields in the schema

8b78139

Mostly taken from the DCAT-AP 2.1 spec doc, adapted for CKAN

[#56] Use choices for resource status

15b0cc1

Merge branch 'master' into 56-add-schema-file-dcat-ap-2.1

da8de09

[#56] Create a full and a slimmed down schema version

602d505

[#56] Update README

614e23b

[#56] README tweaks

c11f3c2

amercader marked this pull request as ready for review June 11, 2024 09:14

[#56] Docstrings

030cd3d

[#56] Fix function call

5fffa15

dev-rke mentioned this pull request Jun 11, 2024

Integration with scheming GovDataOfficial/ckanext-dcatde#26

Closed

wardi reviewed Jun 11, 2024

View reviewed changes

ckanext/dcat/schemas/dcat_ap_2.1_full.yaml Outdated Show resolved Hide resolved

wardi reviewed Jun 11, 2024

View reviewed changes

ckanext/dcat/schemas/dcat_ap_2.1_recommended.yaml Outdated Show resolved Hide resolved

amercader and others added 6 commits June 12, 2024 12:46

Schemas description

ad35359

Co-authored-by: Ian Ward <[email protected]>

[#56] Index subfields as extras_ Solr field

b600493

As this is a `text` field that allows free text search

[#56] Clean the index before tests

f88e433

[#56] Avoid empty list in spatial resolution

898912c

[#56] Markdown for provenance

97e68de

[#56] Don't serialize empty repeating subfields

a8a3f25

Scheming adds a dict with empty keys when empty repeating subfields are submitted from the form. Check that there's an actual value before creating the triples when serializing

amercader mentioned this pull request Jun 19, 2024

SHACL validation for DCAT-AP 2.1.1 #288

Merged

EricSoroos reviewed Jun 24, 2024

View reviewed changes

EricSoroos mentioned this pull request Jun 24, 2024

_object_value and _object_value_list return BNode identifiers #289

Open

amercader added 4 commits July 2, 2024 16:37

[#56] More robust date parsing with dateutil, expand tests

c7b8c02

[#56] Add tests for invalid and ambiguous dates

39b4d91

Merge branch 'master' into 56-add-schema-file-dcat-ap-2.1

31a69f5

[#56] Update changelog with scheming changes

ae78f0f

amercader merged commit 53cedb9 into master Jul 5, 2024
8 checks passed

amercader deleted the 56-add-schema-file-dcat-ap-2.1 branch July 5, 2024 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheming support #281

Scheming support #281

amercader commented May 22, 2024 •

edited

Loading

EricSoroos May 22, 2024

amercader May 23, 2024

EricSoroos commented May 22, 2024 •

edited

Loading

amercader commented May 22, 2024

amercader commented May 22, 2024

EricSoroos commented May 22, 2024

wardi May 22, 2024

amercader May 23, 2024 •

edited

Loading

wardi May 22, 2024

amercader May 23, 2024

EricSoroos commented May 23, 2024

amercader commented Jun 11, 2024

wardi commented Jun 11, 2024

EricSoroos Jun 24, 2024 •

edited

Loading

amercader Jul 2, 2024

EricSoroos Jul 2, 2024

amercader Jul 3, 2024

amercader commented Jul 5, 2024

Scheming support #281

Scheming support #281

Conversation

amercader commented May 22, 2024 • edited Loading

Summary of changes

Compatibility and release plan

Mapping changes

Root level fields

List fields

Repeating subfields

Issues

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EricSoroos commented May 22, 2024 • edited Loading

amercader commented May 22, 2024

amercader commented May 22, 2024

EricSoroos commented May 22, 2024

Choose a reason for hiding this comment

amercader May 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EricSoroos commented May 23, 2024

amercader commented Jun 11, 2024

wardi commented Jun 11, 2024

EricSoroos Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amercader commented Jul 5, 2024

amercader commented May 22, 2024 •

edited

Loading

EricSoroos commented May 22, 2024 •

edited

Loading

amercader May 23, 2024 •

edited

Loading

EricSoroos Jun 24, 2024 •

edited

Loading