-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TG2-VALIDATION_TAXONRANK_STANDARD #162
Comments
Added guid. |
Discussion in call: Parameters: bdq:sourceAuthority (default = http://rs.gbif.org/vocabulary/gbif/rank.xml), when a default is not yet available at a location could be either: Parameters = bdq:sourceAuthority or Parameters = bdq:sourceAuthority (default=[GBIF rank vocabulary]) In the first case discussion of there being a vocabulary to use, but not available at a stable IRI at this point would go into notes. In the second case this is implicit in the square brackets. |
And, when the vocabulary is at a stable location (e.g. at a DOI), use: Parameter = bdq:sourceAuthority (default = http://rs.gbif.org/vocabulary/gbif/rank.xml) |
Discussion, suggestion from @tucotuco , use Parameter = bdq:sourceAuthority, move defaults into notes (non-normative, for us to keep them in one place to work on them), add example table a suite of default parameters: VALIDATION_TAXON_RANK_NOTSTANDARD, bdq:sourceAuthority default = http://rs.gbif.org/vocabulary/gbif/rank.xml. Test definitions remain simple, normative, encapsulated. Different implementors can easily use a set of default parameters as a separate document. |
1 similar comment
Discussion, suggestion from @tucotuco , use Parameter = bdq:sourceAuthority, move defaults into notes (non-normative, for us to keep them in one place to work on them), add example table a suite of default parameters: VALIDATION_TAXON_RANK_NOTSTANDARD, bdq:sourceAuthority default = http://rs.gbif.org/vocabulary/gbif/rank.xml. Test definitions remain simple, normative, encapsulated. Different implementors can easily use a set of default parameters as a separate document. |
@tucotuco suggests: TG2 Parameter Default Value Recommendation document, distinct from the tests. |
Suggested name of the separate document is "Test and Assertion Parameters" |
Or "Test Parameters" |
I think we need affiliation as in "TDWG DQ Test Parameters". I don't think it applies to the assertions as such? |
I'd suggest shorter: | Definition | Validation that a provided value of Taxon Rank in a single record conforms with a specified controlled vocabulary | |
I would also try to stick with “bdq:sourceAuthority” everywhere. In this case a) it is Parameterized and b) is a vocabulary |
Following comments in #163 and #112, I suggest: | Description | A test that checks if the value of dwc:taxonRank unambiguously conforms to the corresponding value provided from a specified bdq:sourceAuthority. I I agree with @tucotuco that for these tests we do not need to add single record - that can be covered in the introductory test as all the validations and amendments cover only single records. If there are any that don't then that is the time to mention "multiple records" |
Curious if you plan to provide any concrete examples? (in Example Mechanisms?). I'm thinking like:
OR
OR
|
@ArthurChapman @tucotuco I will reiterate, single record is an essential part of these descriptions. We cannot omit it. We cannot include it in introductory text. Remember that for every validation that is single record, we must also generate a (trivially generated, which is why we haven't talked about them for some years, but they are required) measure that is multi record that allows users doing quality assurance to assert what constitutes quality in a multi record. Users are also free to assert their own tests, and we must provide a sound model for them to inform users of the meaning of test results. The thing this description goes to has single record or multi record as one of its three non-optional components, we can't leave that out of the description. |
@chicoreus I respectfully disagree that this is a requirement, and for reasons expressed here, believe it makes the tests less broadly usable, for example when we move into worlds (e.g., RDF) where "record" is impossible to define a priori. I would rather let user assert single record tests and multi record tests from tests that are more fundamental than to be painted into a corner where all this hard work is not as applicable as it could be. |
If the tests are run in a pre-amendment phase, amendment phase, and post-amendment phase (which is one way of composing them), where all the measures and validations are run on the data as presented, then the amendments are run, then all the measures and validations are run again on the data with the proposed amendments applied, in this case, VALIDATION_TAXONRANK_NOTSTANDARD and AMENDMENT_TAXONRANK_STANDARDIZED are paired, so if the data as presented for a single record contains a non-standard, but correctable to the controlled vocabulary, value, then the first run of VALIDATION_TAXONRANK_NOTSTANDARD will return a Response.result of NOT_COMPLIANT, AMENDMENT_TAXONRANK_STANDARDIZED would propose the correction to a value in the controlled vocabulary, and the post-amendment phase run of VALIDATION_TAXONRANK_NOTSTANDARD would return a Response.result of COMPLIANT, and if the tests are being run by a data curator doing quality control, that data curator could change their data (or how they are mapping their data to Darwin Core - that's something that does need to go into explanatory text, as the tests will pick up mapping problems and assert the results as pertaining to the single records they are defined for (we didn't define a test to validate all the unique values of dwc:taxonRank in a multi record and on seeing a small set of values all of which could be mapped onto the expected vocabulary, but aren't in it) assert either wholesale changes to the data or a change of the mappings of that data onto Darwin Core, that is supported by the framework, but not a form of test we saw as fitting into the core needs defined by TG3)). For any test, EXTERNAL_PREREQUISITES_NOT_MET, means "try again later, internet connectivity, or a remote service was down, and if you run this test again when the external service is available, you will get a different result". For any test, INTERNAL_PREREQUISITES_NOT_MET, means running the test again on the same data without changing it will result in the same inability to run the test. But, for some Validations (not VALIDATION_TAXONRANK_NOTSTANDARD, as filling in the component parts/atomic terms that are assembled into dwc:scientificName wasn't seen as a CORE need), there are amendments that may fill in values such that running the test in a pre-amendment phase, running the amendments, and then running the validation in a post-amendment test, such that a data curator will be presented with a proposed amendment that they could accept as a change to their data. In a quality control setting, a consumer of data is likely to want to filter a multi record to a set for which all the validation response.status values are RUN_HAS_RESULT and all the validation response.result values are COMPLIANT, for CORE uses, this is all the validations in the CORE set, for other uses it might be a different set. Such a user might wish to include amendments, or not, and might just run a single validation phase, or an amendment phase followed by a post-amendment phase followed by filtering to data with quality for their needs. Much depends on the setting and how the tests are composed. The tests have been deliberately defined as independently as possible so that they can be composed in different ways for different settings. Yes, presentation to consumers of data quality reports is very important. |
Should this now be "VALIDATION_TAXONRANK_STANDARD" or "VALIDATION_TAXONRANK_ISSTANDARD"? I think the first is better. |
Agree to former |
Likewise. STANDARD. Somewhere in the chain of emails I'd sent out a set of suggested changes. |
Not seeing it, but
should cover most of them. |
more... AMBIGUOUS to UNAMBIGUOUS |
…TANDARD DESCRIPTION: Minimal implementation using hardcoded copy of current values. Needs lookup and cache implementation. Working minimal implementation including unit test.
Added to Notes: "This test will fail if there are leading or trailing white space or non-printing characters." |
…K_STANDARD tdwg/bdq#162 this now passes all of the validation test rows for that test, including taxonRank=especies, one of the alternative forms in the GBIF vocabulary.
Updated "Source Authority" and "References" in accord with @chicoreus comment on #163. @Tasilee to check |
Thanks @ArthurChapman. Checked. |
…tdwg/bdq specifications. Updated metadata (added ProvidesVersion and Specification) for tdwg/bdq#162 VALIDATION_TAXONRANK_STANDARD. Updated metadata, no changes to specification. removing reviewed stub method.
Post Zoom 11/7/2023, I have aligned the Source Authority with the suggested syntax: bdq:sourceAuthority default = "GBIF Vocabulary: Taxonomic Rank" [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts] to bdq:sourceAuthority default = "Darwin Core" {https://dwc.tdwg.org} {dwc:taxonRank [https://dwc.tdwg.org/list/#dwc_taxonRank]} {GBIF Vocabulary: Taxonomic Rank [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]} |
On Mon, 10 Jul 2023 18:20:29 -0700 Lee Belbin ***@***.***> wrote:
Post Zoom 11/7/2023, I have aligned the Source Authority with the
suggested syntax:
bdq:sourceAuthority default = "GBIF Vocabulary: Taxonomic Rank"
[https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]
This is probably a case where we do want to assert that the GBIF Vocabulary is the source authority, as it provides a controled vocabulary, while Darwin Core does not.
|
From @chicoreus's comment (#162 (comment)), changed Source Authority from bdq:sourceAuthority default = "Darwin Core" {https://dwc.tdwg.org} {dwc:taxonRank [https://dwc.tdwg.org/list/#dwc_taxonRank]} {GBIF Vocabulary: Taxonomic Rank [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]} to bdq:sourceAuthority default = "GBIF Vocabulary: Taxonomic Rank" {[https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]} {dwc:taxonRank [https://dwc.tdwg.org/list/#dwc_taxonRank]} |
…VALIDATION_TAXONRANK_STANDARD allowing empty values as well as null to use default parameter value. Interpreting expected string value of bdq:sourceAuthority from specification.
Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted". Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated" |
Edited notes, removed duplicated "fail" text, replaced with more explict: "This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters. " |
Changed Source Authority from bdq:sourceAuthority default = "GBIF Vocabulary: Taxonomic Rank" {[https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]} {dwc:taxonRank [https://dwc.tdwg.org/list/#dwc_taxonRank]} to bdq:sourceAuthority default = "GBIF TaxonRank Vocabulary" [https://api.gbif.org/v1/vocabularies/TaxonRank]} {"dwc:taxonRank vocabulary API" [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]} |
The text was updated successfully, but these errors were encountered: