-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TG2-VALIDATION_TAXONID_COMPLETE #121
Comments
Agreed at TDWG 2018 DQIG meeting that any mention of uniqueness is redundant with the resolvability requirement, hence references to uniqueness were dropped. |
We currently have in the notes: "Note that the cause of failure may be due to a service failure. Implementations of this test should account for this type of failure and not necessarily report a failure." Should this then be covered by adding an EXTERNAL_PREREQUISITES_NOT_MET? |
This would apply to any external lookup. One presumes any system failure would generate a specific response like "FAILED_LOOKUP"? |
@ArthurChapman yes, EXTERNAL_PREREQUISITES_NOT_MET would cover reporting some sort of transient system failure where asking the same question later might get an answer. @Tasilee Failed_Lookup has ambiguity to it - it carries the potential implication that a lookup was run (and failed), and that something was looked up. EXTERNAL_PREREQUISITES_NOT_MET covers the more general case of some external resource (lookup, calculation, or otherwise) was not available, try again later. |
@Tasilee and I have a problem with this one. How do we resolve the TaxonID. The examples given in Darwin Core include a GUID and just a number ("32567") which is similar to our example of a failure. How is it possible for us to Validate - unless it references an authority - which according to Darwin Core is not the case. I don't see how this can work. @tucotuco, @chicoreus is this possible to do? Is it a valuable test? |
The best practices for identifiers says they should be globally unique for the instance of the Class they represent, persistent, and resolvable. That is an applicability statement apart from Darwin Core. In Darwin Core, or in a Darwin Core Archive, there are no such restrictions. This shouldn't be too disturbing, as Darwin Core does not implement restrictions in and of itself, it merely provides definitions and other guiding information. So, the problem, if it were one, would not be unique to the dwc:taxonID term. What does seem to be a problem is that, if the taxonID does not contain the information to resolve it (the authority), that is an internal prerequisite that isn't met - there is a problem with the data rather than a problem with a service. That is not captured in the Expected Response. |
Are we saying that the Expected response should be "EXTERNAL_PREREQUISITES_NOT_MET if resolving service was unavailable; INTERNAL_PREREQUISITES_NOT_MET if the field dwc:taxonID is either not present or is EMPTY or is not resolvable; COMPLIANT if the value of the field dwc:taxonID is resolvable; otherwise NOT_COMPLIANT" given @chicoreus comment on EXTERNAL and @tucotuco on INTERNAL? |
I wouldn't think so - as if it is non-resolvable it is NOT_COMPLIANT. What John is saying is that it requires somewhere in the record a reference to what the resolving authority is. I think we are saying |
I agree with @ArthurChapman that the response should be NOT_COMPLIANT if the taxonId is not resolvable, but I would not expect the authority information to be anywhere else in the record than in taxonId. It would be resolvable if it was possible to directly (full URI) or indirectly (unambiguous namespace from which full URI could be constructed) resolve the taxonId. |
OK, so is the Expected Response now ok? |
I would be specific, INTERNAL_PREREQUISITE_NOT_MET if the field dwc:taxonID is either not present or is EMPTY or does not include the resolving authority. |
Thanks @ArthurChapman and @tucotuco - done. |
@tucotuco how about a taxon in the form urn:uuid:e34fda24-f53e-4627-b591-b6c6ca349293 that should be an unambiguous unique taxonID, with a known urn scheme, just not resolvable. Or, e34fda24-f53e-4627-b591-b6c6ca349293? I'd tend to think that this test is for uniqueness, not necessarily resolvability. Would the requirement be any urn:uuid, urn:catalog, lsid:, http:, https: identifier? |
@chicoreus That may be a GUID. It is in the form of a GUID. But no one can resolve it to know for sure. If it resolves in addition, you can be sure it is a GUID. But these are just my perspective. Darwin Core doesn't require anything in particular, so it comes down to what we want the test to do everywhere. |
Not sure of the wording here. "... INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY or does not include the resolving authority ..." The example has just "dwc:taxonID=54367" i.e. is just a number but does NOT include a resolving authority - so as written would be (at least to me) - INTERNAL_PREREQUISITES_NOT_MET Also none of the examples in the test dataset include "the resolving authority" With all these we need to either 1) delete the worlds "or does not include the resolving authority" or 2) modify all our examples |
@ArthurChapman I'd agree. I'd concur with deleting the phrase "or does not include the resolving authority" from the specification. But, there is likely more work required. urn:lsid:marinespecies.org:taxname:406150 is a likely, unique, valid, non ambiguous value for taxonID. Given the specification of "GBIF backbone taxonomy service", there isn't actually a way of querying that service for a taxonID, e.g. https://api.gbif.org/v1/species/search?taxonID=urn:lsid:marinespecies.org:taxname:406150&datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c ignores the invalid term taxonID= and just returns everything in the backbone taxonomy. Taxon records in the backbone taxonomy do include taxonID, so an implementation which works off of a download from GBIF would work, but I'd be hard pressed to implement this as defined, unless we assert that the only non-ambiguous taxonID values are identifiers of records in GBIF's backbone taxonomy, thus, https://api.gbif.org/v1/species/2435099 and 2435099 would both be compliant, but the quite unambigous urn:lsid:marinespecies.org:taxname:406150 as we can't find it through the GBIF service, would be ambiguous. Noting that https://api.gbif.org/v1/species/54367 currently does not return any results, suggesting that either GBIF deleted the record, or 54367 is ambiguous as we don't know which dataset it belongs to.... urn:lsid:marinespecies.org:taxname:406150 |
Discussion of the TG2 team 7th March 2022 suggested that this test was too complex to implement with due utility. Consequently, it was suggested that we rename it as an 'INCOMPLETE' type test of dwc:taxnID with compliance only if both a URI and suffix (? a better term?) were present. |
@tucotuco 's "namespace indicator" to replace suffix" seems good to me. |
Are we all happy with this test as it stands now? |
On trying to implement this, finding the specification wanting. Currently: "Description: Does the value of dwc:taxonID contain both a URI and namespace indicator?" Propose the following specification: Here the semantics of LSID are valuable, for to be validly formed, a LSID must specify the authority, namespace, and objectID - which is really what we want to know in this test, can we tell what the taxonID reference is and what it is referring to, while for http:/https URIs, the path can contain the equivalent of the lsid namespace and the lsid objectID, as in https://www.gbif.org/species/2529789, where https://www.gbif.org/species/ is a validly formed URI that needs special case handling to tell that it doesn't actually contains a reference to a particular taxon. The specification could include additional common special cases (e.g. URIs with a path containing aphia.php and query containing id=), or not. |
…cifications. DESCRIPTION: Updating tdwg/bdq#120 VALIDATION_TAXONID_NOTEMPTY to current specification. Adding an implementation of tdwg/bdq#121 VALIDATION_TAXONID_COMPLETE with notes about needing to update the specification. Adding supporting RFC8141URN and LSID classes to help in identifying and parsing URNs and LSIDs to support tdwg/bdq#121. Initial work in progress on implementation of AMENDMENT_SCIENTIFICNAME_FROM_TAXONID.
…ntation of tdwg/bdq#101 VALIDATION_POLYNOMIAL_CONSISTENT, also adding test cases from current validation data csv file that were failing, along with commented out case that may be in error in the validation data. Conforming implementation of tdwg/bdq#121 VALIDATION_TAXONID_COMPLETE to current specification for handling empty taxonID, also adding test cases from current validation data csv file that were failing. Fixing methods that should be static but aren't.
An informative comment from @timrobertson100 19th September 2022: "When it comes to occurrence record processing, the GBIF occurrence systems currently pass this value on, only making use on the literal values (e.g. scientificName) so it’s not something we’d have a very strong an opinion on, in e.g. a spreadsheet. My gut feeling is a “scope:value” format (e.g. gbif:1234) is better than a URL, for the reason that URLs are generally less stable over time. As an example. “species” in that URL is already questionable and a future GBIF API would be better using e.g. “../taxon/..” and concept based identification of organisms". |
How about, (taking in Tim And Markus' comments on scope:value): COMPLIANT if (1) taxonID is a validly formed LSID, or (2) taxonID is a validly formed URN with at least NID and NSS present, or (3) taxonID is in the form scope:value, or (4) taxonID is a validly formed URI with host and path where path consists of more than just "/"; otherwise NOT_COMPLIANT |
@chicoreus - see comment and question under #71 |
@ArthurChapman and I have re-read the Expected Response and we realize that we will need to better handle the terms LSID, URN, NID and NSS and possibly URI. Do we expand it in the test? |
I wouldn't add them to the Vocabulary as they are standard terms and this is the only test that uses them. I would add a reference(s) in the References and perhaps add a note if that can define them simply. |
OK, I've added a few Wikipedia references that seem easy to understand. In doing this, I note that we use a different format for References (dot points) compared with Information Elements and Examples (new table lines). I've used the latter here for illustration compared with, for example #102. Does it matter? I think consistency is warranted. |
Personally, I prefer the dot points |
I have added to the Notes to be consistent with #71: "When referencing a GBIF taxon by GBIF's identifier for that taxon, use the the pseudo-namespace "gbif:" and the form "gbif:{integer}" as the value for dwc:taxonID." |
Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted". Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated" |
We have made this test SUPPLEMENTARY and Closed the issue - which means that while we do not believe this test should be CORE, there may be circumstances where some communities may want it as CORE. This change derived from a discussion at TDWG 2023 on the use of dwc:taxonID and dwc:scientificNameID. It was concluded that a test VALIDATION_SCIENTIFICNAMEID was justified as CORE while this test should be SUPPLEMENTARY |
…date with current specification, the scope:value case is vauge, adding support for alphanumeric string scope:value pairs.
The text was updated successfully, but these errors were encountered: