Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-VALIDATION_TAXONRANK_STANDARD #162

Open
ArthurChapman opened this issue Sep 3, 2018 · 35 comments
Open

TG2-VALIDATION_TAXONRANK_STANDARD #162

ArthurChapman opened this issue Sep 3, 2018 · 35 comments
Labels
Conformance CORE TG2 CORE tests NAME Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY

Comments

@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Sep 3, 2018

TestField Value
GUID 7bdb13a4-8a51-4ee5-be7f-20693fdb183e
Label VALIDATION_TAXONRANK_STANDARD
Description Does the value of dwc:taxonRank occur in the bdq:sourceAuthority?
TestType Validation
Darwin Core Class dwc:Taxon
Information Elements ActedUpon dwc:taxonRank
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonRank is bdq:Empty; COMPLIANT if the value of dwc:taxonRank is in the bdq:sourceAuthority; otherwise NOT_COMPLIANT.
Data Quality Dimension Conformance
Term-Actions TAXONRANK_STANDARD
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "GBIF TaxonRank Vocabulary" [https://api.gbif.org/v1/vocabularies/TaxonRank]} {"dwc:taxonRank vocabulary API" [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]}
Specification Last Updated 2023-09-18
Examples [dwc:taxonRank="kingdom": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:taxonRank has an equivalent in the bdq:sourceAuthority"]
[dwc:taxonRank="sp.": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:taxonRank does not have an equivalent in the bdq:sourceAuthority"]
Source TDWG2018
References
Example Implementations (Mechanisms) Kurator/FilteredPush sci_name_qc Library
Link to Specification Source Code https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L2165
Notes This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters.
@Tasilee Tasilee changed the title TG2-VALIDATION_TAXONRANK_NOTSTANDARD TG2 - VALIDATION_TAXONRANK_NOTSTANDARD Sep 4, 2018
@Tasilee Tasilee mentioned this issue Sep 5, 2018
@chicoreus
Copy link
Collaborator

Added guid.

@Tasilee Tasilee added Test Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT and removed Test labels Sep 25, 2018
@tucotuco tucotuco added the Parameterized Test requires a parameter label Nov 5, 2018
@Tasilee Tasilee changed the title TG2 - VALIDATION_TAXONRANK_NOTSTANDARD TG2-VALIDATION_TAXONRANK_NOTSTANDARD Jun 5, 2019
@chicoreus
Copy link
Collaborator

Discussion in call: Parameters: bdq:sourceAuthority (default = http://rs.gbif.org/vocabulary/gbif/rank.xml), when a default is not yet available at a location could be either:

Parameters = bdq:sourceAuthority

or

Parameters = bdq:sourceAuthority (default=[GBIF rank vocabulary])

In the first case discussion of there being a vocabulary to use, but not available at a stable IRI at this point would go into notes. In the second case this is implicit in the square brackets.

@chicoreus
Copy link
Collaborator

And, when the vocabulary is at a stable location (e.g. at a DOI), use:

Parameter = bdq:sourceAuthority (default = http://rs.gbif.org/vocabulary/gbif/rank.xml)

@chicoreus
Copy link
Collaborator

Discussion, suggestion from @tucotuco , use Parameter = bdq:sourceAuthority, move defaults into notes (non-normative, for us to keep them in one place to work on them), add example table a suite of default parameters:

VALIDATION_TAXON_RANK_NOTSTANDARD, bdq:sourceAuthority default = http://rs.gbif.org/vocabulary/gbif/rank.xml.

Test definitions remain simple, normative, encapsulated.

Different implementors can easily use a set of default parameters as a separate document.

1 similar comment
@chicoreus
Copy link
Collaborator

Discussion, suggestion from @tucotuco , use Parameter = bdq:sourceAuthority, move defaults into notes (non-normative, for us to keep them in one place to work on them), add example table a suite of default parameters:

VALIDATION_TAXON_RANK_NOTSTANDARD, bdq:sourceAuthority default = http://rs.gbif.org/vocabulary/gbif/rank.xml.

Test definitions remain simple, normative, encapsulated.

Different implementors can easily use a set of default parameters as a separate document.

@chicoreus
Copy link
Collaborator

@tucotuco suggests: TG2 Parameter Default Value Recommendation document, distinct from the tests.

@ArthurChapman
Copy link
Collaborator Author

ArthurChapman commented Mar 31, 2020

Suggested name of the separate document is "Test and Assertion Parameters"

@ArthurChapman
Copy link
Collaborator Author

Or "Test Parameters"

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 31, 2020

I think we need affiliation as in "TDWG DQ Test Parameters". I don't think it applies to the assertions as such?

ArthurChapman added a commit that referenced this issue Oct 6, 2020
In accord with #189 added test data file for TAXONRANK_NOTSTANDARD #162
@chicoreus
Copy link
Collaborator

I'd suggest shorter:

| Definition | Validation that a provided value of Taxon Rank in a single record conforms with a specified controlled vocabulary |

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 21, 2022

I would also try to stick with “bdq:sourceAuthority” everywhere. In this case a) it is Parameterized and b) is a vocabulary

@ArthurChapman
Copy link
Collaborator Author

Following comments in #163 and #112, I suggest:

| Description | A test that checks if the value of dwc:taxonRank unambiguously conforms to the corresponding value provided from a specified bdq:sourceAuthority. I

I agree with @tucotuco that for these tests we do not need to add single record - that can be covered in the introductory test as all the validations and amendments cover only single records. If there are any that don't then that is the time to mention "multiple records"

@debpaul
Copy link

debpaul commented Mar 21, 2022

Curious if you plan to provide any concrete examples? (in Example Mechanisms?). I'm thinking like:

  1. As a curator, I sent this taxonomic name with dwc:taxonRank == [given rank empty]
  2. I get error of INTERNAL_PREREQUISITES_NOT_MET since dwc:taxonRank is EMPTY;
  3. Now what do I do?

OR

  1. In the dwc:taxonRank I used a "non-standard" value like {sp., spp., subsp., fam, etc}.
  2. The test returns
  • EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available;
    • otherwise NOT_COMPLIANT.
  1. Now what do I do?

OR

  1. I provide a valid value for dwc:taxonRank and get
  2. COMPLIANT if the value of dwc:taxonRank is in the bdq:sourceAuthority;
  3. in which case as a data provider I get a gold star?

@chicoreus
Copy link
Collaborator

@ArthurChapman @tucotuco I will reiterate, single record is an essential part of these descriptions. We cannot omit it. We cannot include it in introductory text. Remember that for every validation that is single record, we must also generate a (trivially generated, which is why we haven't talked about them for some years, but they are required) measure that is multi record that allows users doing quality assurance to assert what constitutes quality in a multi record. Users are also free to assert their own tests, and we must provide a sound model for them to inform users of the meaning of test results. The thing this description goes to has single record or multi record as one of its three non-optional components, we can't leave that out of the description.

@tucotuco
Copy link
Member

@chicoreus I respectfully disagree that this is a requirement, and for reasons expressed here, believe it makes the tests less broadly usable, for example when we move into worlds (e.g., RDF) where "record" is impossible to define a priori. I would rather let user assert single record tests and multi record tests from tests that are more fundamental than to be painted into a corner where all this hard work is not as applicable as it could be.

@chicoreus
Copy link
Collaborator

@debpaul

If the tests are run in a pre-amendment phase, amendment phase, and post-amendment phase (which is one way of composing them), where all the measures and validations are run on the data as presented, then the amendments are run, then all the measures and validations are run again on the data with the proposed amendments applied, in this case, VALIDATION_TAXONRANK_NOTSTANDARD and AMENDMENT_TAXONRANK_STANDARDIZED are paired, so if the data as presented for a single record contains a non-standard, but correctable to the controlled vocabulary, value, then the first run of VALIDATION_TAXONRANK_NOTSTANDARD will return a Response.result of NOT_COMPLIANT, AMENDMENT_TAXONRANK_STANDARDIZED would propose the correction to a value in the controlled vocabulary, and the post-amendment phase run of VALIDATION_TAXONRANK_NOTSTANDARD would return a Response.result of COMPLIANT, and if the tests are being run by a data curator doing quality control, that data curator could change their data (or how they are mapping their data to Darwin Core - that's something that does need to go into explanatory text, as the tests will pick up mapping problems and assert the results as pertaining to the single records they are defined for (we didn't define a test to validate all the unique values of dwc:taxonRank in a multi record and on seeing a small set of values all of which could be mapped onto the expected vocabulary, but aren't in it) assert either wholesale changes to the data or a change of the mappings of that data onto Darwin Core, that is supported by the framework, but not a form of test we saw as fitting into the core needs defined by TG3)).

For any test, EXTERNAL_PREREQUISITES_NOT_MET, means "try again later, internet connectivity, or a remote service was down, and if you run this test again when the external service is available, you will get a different result".

For any test, INTERNAL_PREREQUISITES_NOT_MET, means running the test again on the same data without changing it will result in the same inability to run the test. But, for some Validations (not VALIDATION_TAXONRANK_NOTSTANDARD, as filling in the component parts/atomic terms that are assembled into dwc:scientificName wasn't seen as a CORE need), there are amendments that may fill in values such that running the test in a pre-amendment phase, running the amendments, and then running the validation in a post-amendment test, such that a data curator will be presented with a proposed amendment that they could accept as a change to their data.

In a quality control setting, a consumer of data is likely to want to filter a multi record to a set for which all the validation response.status values are RUN_HAS_RESULT and all the validation response.result values are COMPLIANT, for CORE uses, this is all the validations in the CORE set, for other uses it might be a different set. Such a user might wish to include amendments, or not, and might just run a single validation phase, or an amendment phase followed by a post-amendment phase followed by filtering to data with quality for their needs.

Much depends on the setting and how the tests are composed. The tests have been deliberately defined as independently as possible so that they can be composed in different ways for different settings.

Yes, presentation to consumers of data quality reports is very important.

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 22, 2022

Should this now be "VALIDATION_TAXONRANK_STANDARD" or "VALIDATION_TAXONRANK_ISSTANDARD"?

I think the first is better.

@ArthurChapman
Copy link
Collaborator Author

Agree to former

@chicoreus
Copy link
Collaborator

Likewise. STANDARD.

Somewhere in the chain of emails I'd sent out a set of suggested changes.

@chicoreus
Copy link
Collaborator

Not seeing it, but

  • NOTSTANDARD to STANDARD
  • NOTFOUND to FOUND
  • EMPTY to NOTEMPTY
  • OUTOFRANGE to INRANGE
  • INCONSISTENT to CONSISTENT

should cover most of them.

@Tasilee Tasilee changed the title TG2-VALIDATION_TAXONRANK_NOTSTANDARD TG2-VALIDATION_TAXONRANK_STANDARD Mar 22, 2022
@Tasilee
Copy link
Collaborator

Tasilee commented Mar 22, 2022

more...

AMBIGUOUS to UNAMBIGUOUS
GREATERTHAN to LESSTHAN
ZEO to NOTZERO
TERRESTRIALMARINE same
INCOMPLETE to COMPLETE

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Aug 18, 2022
…TANDARD DESCRIPTION: Minimal implementation using hardcoded copy of current values. Needs lookup and cache implementation. Working minimal implementation including unit test.
@Tasilee
Copy link
Collaborator

Tasilee commented Sep 12, 2022

Added to Notes: "This test will fail if there are leading or trailing white space or non-printing characters."

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Sep 16, 2022
…K_STANDARD tdwg/bdq#162 this now passes all of the validation test rows for that test, including taxonRank=especies, one of the alternative forms in the GBIF vocabulary.
@ArthurChapman
Copy link
Collaborator Author

Updated "Source Authority" and "References" in accord with @chicoreus comment on #163. @Tasilee to check

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 27, 2023

Thanks @ArthurChapman. Checked.

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jun 26, 2023
…tdwg/bdq specifications. Updated metadata (added ProvidesVersion and Specification) for tdwg/bdq#162 VALIDATION_TAXONRANK_STANDARD.  Updated metadata, no changes to specification. removing reviewed stub method.
@Tasilee
Copy link
Collaborator

Tasilee commented Jul 11, 2023

Post Zoom 11/7/2023, I have aligned the Source Authority with the suggested syntax:

bdq:sourceAuthority default = "GBIF Vocabulary: Taxonomic Rank" [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]

to

bdq:sourceAuthority default = "Darwin Core" {https://dwc.tdwg.org} {dwc:taxonRank [https://dwc.tdwg.org/list/#dwc_taxonRank]} {GBIF Vocabulary: Taxonomic Rank [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]}

@chicoreus
Copy link
Collaborator

chicoreus commented Jul 11, 2023 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Jul 11, 2023

From @chicoreus's comment (#162 (comment)), changed Source Authority from

bdq:sourceAuthority default = "Darwin Core" {https://dwc.tdwg.org} {dwc:taxonRank [https://dwc.tdwg.org/list/#dwc_taxonRank]} {GBIF Vocabulary: Taxonomic Rank [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]}

to

bdq:sourceAuthority default = "GBIF Vocabulary: Taxonomic Rank" {[https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]} {dwc:taxonRank [https://dwc.tdwg.org/list/#dwc_taxonRank]}

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 16, 2023
…VALIDATION_TAXONRANK_STANDARD allowing empty values as well as null to use default parameter value. Interpreting expected string value of bdq:sourceAuthority from specification.
@Tasilee
Copy link
Collaborator

Tasilee commented Sep 18, 2023

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted".

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"

@chicoreus chicoreus added the CORE TG2 CORE tests label Sep 18, 2023
@chicoreus
Copy link
Collaborator

Edited notes, removed duplicated "fail" text, replaced with more explict: "This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters. "

@Tasilee
Copy link
Collaborator

Tasilee commented Apr 16, 2024

Changed Source Authority from

bdq:sourceAuthority default = "GBIF Vocabulary: Taxonomic Rank" {[https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]} {dwc:taxonRank [https://dwc.tdwg.org/list/#dwc_taxonRank]}

to

bdq:sourceAuthority default = "GBIF TaxonRank Vocabulary" [https://api.gbif.org/v1/vocabularies/TaxonRank]} {"dwc:taxonRank vocabulary API" [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Conformance CORE TG2 CORE tests NAME Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY
Projects
None yet
Development

No branches or pull requests

5 participants