Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2 - Structuring Test Descriptions #197

Closed
ArthurChapman opened this issue Mar 23, 2022 · 27 comments
Closed

TG2 - Structuring Test Descriptions #197

ArthurChapman opened this issue Mar 23, 2022 · 27 comments

Comments

@ArthurChapman
Copy link
Collaborator

We have been discussing Test Descriptions under a number of other Issues (#112, #163 and #162) and I thought it best to create an Issue to consolidate the discussion:

In summary, it appears that we have agreed on a few points for Test Descriptions

  1. That we use a standard format for all descriptions with 4 (or 3) elements viz.: Validation, information element, resource type, and criterion - but see discussion on "single record" below
  2. That we use the Information Element rather than a plain English equivalent (i.e. dwc.taxonRank rather than "taxon rank")
  3. That we say it is a "A validation test ..." for Validations and "A test that proposes to amend ..."
  4. That we use bdq:sourceAuthority rather than a "source authority"

There are three issues that we still need to resolve

  1. Use of "single record" in the description. @chicoreus says it is required for the Framework. @Tasilee, @tucotuco and @ArthurChapman are not convinced that is needed within the description. As all 99 tests are "single record" tests it is suggested that it is redundant in the description, and if needed for the rdf for the Standard that that could be added in at the rdf creation stage.
  2. It has been suggested by @chicoreus that we need a second Field called "Brief" where an even briefer description is included. @Tasilee, @tucotuco and @ArthurChapman are not yet convinced of the need for two Description fields but are prepared to go along with it if there is no alternative
  3. Not fully discussed, but it has been suggested in Paramaterized tests that wording include "specified" in front of 'bdq:sourceAuthority - see example below.

examples that cover these issues (contentious issues in bold)

| Description | A test that proposes to amend the value of dwc:taxonRank in a single record to unambiguously conform to the corresponding value provided from a specified bdq:sourceAuthority. I

| Brief | Amendment proposed for dwc:taxonRank to standard value|

@ArthurChapman
Copy link
Collaborator Author

I previously asked "What are we trying to do with the description" to which @chicoreus responded

" I see that we need (1) an rdfs:label that can be applied to a ContextualizedCriterion, and (2) this label can be presented to human consumers of descriptions the validation in a description of the data quality needs met by the core tests, or as metadata about the validation in a data quality report from a mechanism that ran the core tests to fit a data quality need.

The Specification (the "Expected Response" in the markdown tables in the github issues), is the description of the validation for implementors, the Description is the parallel metadata about the ContextualizedCriterion+Specification+Validation for end users."

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 23, 2022

My preferences have been captured on (1) and (2) above. In regards “Parameterized”, I’d prefer to keep it to statements like “….was amended from values in bdq:sourceAuthority..” The use of “specified” is redundant and will be confusing.

We have bdq:sourceAuthority covered under the Parameter(s), and Notes for defaults - which I am now thinking would make more sense on the Parameter(s) line :|. We should use the Notes only where clarification is required.

@tucotuco
Copy link
Member

I agree with @Tasilee on all counts.

@ArthurChapman
Copy link
Collaborator Author

Following examples set by @Tasilee for the OTHER tests and my doing the NAME tests we have struck upon a formula viz

  • STANDARD Does the value of xxx occur in bdq:sourceAuthority?
  • STANDARDIZED Standardize the value of xxx using bdq:sourceAuthority
  • NOTEMPTY Is there a value in xxx?
  • FOUND Does the value of xxx occur at rank of xxx in bdq:sourceAuthority?
  • UNAMBIGUOUS e.g. Can the combination of higher classification taxonomic terms be unambiguously resolved using bdq:sourceAuthority? (TG2-VALIDATION_CLASSIFICATION_CONSISTENT #123)
  • COMPLETE e.g. Does the value of dwc:taxonID contain both a URI and namespace indicator? (TG2-VALIDATION_TAXONID_COMPLETE #121)

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 24, 2022

Thanks @ArthurChapman. Seems the two of us have taken pity on @tucotuco 's busy life, so I have just done the Descriptions for SPACE with NOTEMPTY. Easy.

I'll do SPACE and STANDARD and call it a day...

I was having so much fun....I've also done SPACE and STANDARDIZED

@ArthurChapman
Copy link
Collaborator Author

I have added descriptions to all the SPACE tests (except those done by @Tasilee. Some were quite difficult and someone should check them all.

As you can see from the many issue I raised, there are some possible changes needed to a number of Expected Responses.

@ArthurChapman
Copy link
Collaborator Author

I have just done a full run through of the all the test descriptions and made some minor edits, spellling corrections, etc. for consistency.

While doing that, I added descriptions for all the TIME tests, and added either a period or a ? at the end of each description.

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 27, 2022

Great job @ArthurChapman! I’ll do a check through tomorrow.

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 27, 2022

This is starting to look like an ok template for one class of AMENDMENTS (#115):

"...AMENDED the value of dwc:occurrenceStatus if the value can be matched to a standard value in bdq:sourceAuthority..."

Happy?

@ArthurChapman
Copy link
Collaborator Author

Are talking about descriptions or Expected Response?

For the Description I have used "Proposed to amend" as suggested by @chicoreus though AMENDED fits with the Expected Response.

I like it.

Could shorten to "...AMENDED the value of dwc:occurrenceStatus if it can be matched to a standard value in bdq:sourceAuthority..."

Don't need the second "the value" as you already have "value" as the subject of the sentence.

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 27, 2022

Yes, agreed. This is what I have also concluded in working through the ERs for the AMENDMENTs

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 27, 2022

I have just finished going through all the Expected Responses of the AMENDMENTs conforming them to the template

AMENDED the value of xxx if ....;

I think this has made all of the ERs clearer, but all need to check the logic and the phrasing. There have been so many changes in the last week that I'll need to do a new dump of the Test specs.

@ArthurChapman
Copy link
Collaborator Author

Just a thought - @Tasilee has

"...AMENDED the value of dwc:occurrenceStatus if it can be matched to a standard value in bdq:sourceAuthority..."

Suggest

"...AMENDED the value of dwc:occurrenceStatus if it conforms to a standard value in bdq:sourceAuthority..."

@ArthurChapman
Copy link
Collaborator Author

Following discussion with @Tasilee, we have settled on the formula (I have fixed some tense issues)

"...AMENDED the value of dwc:occurrenceStatus if could be matched to a value in bdq:sourceAuthority..."

@ArthurChapman
Copy link
Collaborator Author

ArthurChapman commented Mar 27, 2022

Changed my mind again

"...AMENDED the value of dwc:occurrenceStatus if it was matched to a value in bdq:sourceAuthority..."

@ArthurChapman
Copy link
Collaborator Author

I have been through all the AMENDMENT Expected Responses. Standardized to the formula

"...AMENDED the value of dwc:occurrenceStatus if it matched a value in bdq:sourceAuthority..."

@chicoreus
Copy link
Collaborator

chicoreus commented Mar 28, 2022 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 28, 2022

I raised this point with @ArthurChapman this morning. My feeling was that we needed a way of phrasing either the ‘interpretation’ of the supplied value by code or by the reference (dq:sourceAuthority, ISO, parameters, vocabulary…) with ‘synonymy support’ (e.g., sp. -> species) or both. In many cases, we have not been explicit about this.

@ArthurChapman
Copy link
Collaborator Author

@chicoreus has a good point when he says (in an email)

"If it matched, it doesn't need amending. The language here probably needs to parallel numeric values, "if it could be unambiguosuly interpreted as a value in", or "if it could be unambiguously conformed to a value in"."

I suggest we go with the wording recommended there for the AMENDMENTS - ""if it could be unambiguously interpreted as a value in"

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 29, 2022

I've started to look at the Descriptions for the AMENDMENTs. There are a few different structures (surprise!)

  1. Propose/d amendment to the value of ...
  2. Standardize the value of ...

I am presuming that (1) using "Propose amendment to the value of ..." would be preferable?

@ArthurChapman
Copy link
Collaborator Author

Interesting. Propose seems good, but then we say AMENDED in the Expected Responses.

@chicoreus
Copy link
Collaborator

And in the definition for AMENDED, we should say that the change to the data is proposed, and a data curator may choose to apply that proposal to the database of record, or not. Clear finding from both Kurator and FilteredPush projects is that it is very important to use language for data curators that doesn't imply automated changes to their data outside their control (thus proposed is a good word to use in the Description, which is targeted at consumers of data quality reports, unlike the Expected Response (Specification) which is targeted at implementors).

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 30, 2022

We certainly need to figure how far we align the Descriptions (a generic explanation) and Expected Responses (implementation detail).

(I just noted @chicoreus comments - and I agree that the definition of the AMENDED should reflect Proposed)

I think it is fair to use "Propose" in the Descriptions and "Amended" in the ERs

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 30, 2022

Here is what I am proposing, using #63 as an example

Description: "Propose amendment to the value of dwc:basisOfRecord using bdq:sourceAuthority."

ER: "AMENDED the value of dwc:basisOfRecord if it could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED"

@ArthurChapman
Copy link
Collaborator Author

Looks good

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 30, 2022

I've just finished a pass through all the AMENDMENTs tweaking the Descriptions and Expected Response to conform to that template.

@chicoreus
Copy link
Collaborator

Structure of the markdown tables has been more closely aligned to the bdqffdq Framework terms,

New issue template created, see #298

See the Rationale management documentation in the supplementary section of the standard document for details of the markdown table.

The code in https://github.com/kurator-org/bdq_issue_to_csv can translated the markdown tables in the issues into the csv representation of the tests in https://github.com/tdwg/bdq/blob/master/tg2/core/TG2_tests.csv which is copied into the draft standard submission at: https://github.com/tdwg/bdq/blob/master/tg2/_review/vocabulary/bdqcore_terms.csv

The columns in bdqcore_terms.csv map to the bdqffdq ontology and onto the TestField column in the markdown table in the github issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants