Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-INTERNAL PREREQUISITES NOT MET vs NOT CHANGED #180

Open
Tasilee opened this issue Sep 2, 2019 · 6 comments
Open

TG2-INTERNAL PREREQUISITES NOT MET vs NOT CHANGED #180

Tasilee opened this issue Sep 2, 2019 · 6 comments

Comments

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 2, 2019

Email September 1 from @chicoreus

Thanks again. A good pickup. I would readily agree with you that we would only overwrite an existing value for "_STANDARDIZED" or "_CONVERTED".

I'll comment inline below in BOLD against each issue.

On Sun, Sep 1, 2019 at 12:52 AM Paul J. Morris [email protected] wrote:
We have 16 amendments which modify one (sometimes more) term on the
basis of values found in other terms. These are currently specified
for the most part to return Internal Prerequisites Not Met when the
term for which a change could be proposed contains a value, with the
exception of some amendments that are clearly intended to change
existing values (e.g. _STANDARDIZED, _CONVERTED).

I agree with amendments that use data from one term to change another
term generally not overwriting existing values, just filling in
blanks, except in the standardization cases.

However, I think we will confuse users if we respond with
INTERNAL_PREREQUESITES_NOT_MET instead of NOT_CHANGED in the case of
not changing an existing value. We should restrict
INTERNAL_PREREQUSITES_NOT_MET to cases where we don't have the
information to propose a change, and use NOT_CHANGED when we aren't
changing an existing value because it exists.

These 16 amendments listed below, with comments (first 12 look like
they should have their behavior changed).

I've noted a few other points here that need comments and discussion in
their issues, but wanted to put the larger picture in one place for
consideration.

-Paul

#57
AMENDMENT_TAXONID_FROM_TAXON
EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority
service was not available; INTERNAL_PREREQUESITES_NOT_MET if all of
dwc:kingdom, dwc:phylum, dwc:class, dwc:order, dwc:family, dwc:genus,
and dwc:scientificName are EMPTY; AMENDED if a value for dwc:taxonID is
unique and resolvable on the basis of the value of the lowest ranking
NOT_EMPTY taxon classification terms dwc:scientificName,
dwc:scientificNameAuthorship, dwc:kingdom, dwc:phylum, dwc:class, etc.;
otherwise NOT_CHANGED

As defined: an existing dwcTaxonID will be overwritten.

Is this desired? If not, this should probably be NOT_CHANGED.

So, would this be what you are suggesting-

EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority service was not available; INTERNAL_PREREQUESITES_NOT_MET if all of dwc:kingdom, dwc:phylum, dwc:class, dwc:order, dwc:family, dwc:genus and dwc:scientificName are EMPTY; AMENDED if a value for dwc:taxonID is unique and resolvable on the basis of the value of the lowest ranking NOT_EMPTY taxon classification terms dwc:scientificName, dwc:scientificNameAuthorship, dwc:kingdom, dwc:phylum, dwc:class, etc., and dwc:taxonID is EMPTY; otherwise NOT_CHANGED ??

#32
AMENDMENT_COORDINATES_FROM_VERBATIM
INTERNAL_PREREQUESITES_NOT_MET if Verbatim coordinates (either
dwc:verbatimLatitude and dwc:verbatimLongitude or
dwc:verbatimCoordinates) were not interpretable into coordinates as
decimal degrees or either dwc:decimalLatitude or dwc:decimalLongitude
was not EMPTY; AMENDED if dwc:decimalLatitude and dwc:decimalLongitude
were populated from information in verbatim coordinate information
(dwc:verbatimCoordinates or dwc:verbatimLatitude and
dwc:verbatimLongitude, plus dwc:verbatimCoordinateSystem and
dwc:verbatimSRS); otherwise NOT_CHANGED

As defined, existing values for decimalLat/Long will result in Internal
Prerequisites Not Met. This should probably be NOT_CHANGED.

How about

INTERNAL_PREREQUESITES_NOT_MET if Verbatim coordinates (either dwc:verbatimLatitude and dwc:verbatimLongitude or dwc:verbatimCoordinates) were not interpretable into coordinates as
decimal degrees; AMENDED if dwc:decimalLatitude and dwc:decimalLongitude were populated from information in verbatim coordinate information (dwc:verbatimCoordinates or dwc:verbatimLatitude and dwc:verbatimLongitude, plus dwc:verbatimCoordinateSystem and dwc:verbatimSRS) and both dwc:decimalLatitude and dwc:decimalLongitude were EMPTY; otherwise NOT_CHANGED ??

#73
AMENDMENT_COUNTRYCODE_FROM_COORDINATES
EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority
service was not available; INTERNAL_PREREQUESITES_NOT_MET if the fields
dwc:decimalLatitude, dwc:decimalLongitude are EMPTY or the
dwc:decimalLatitude and dwc:decimalLongitude passed to the
dwc:countryCode determination service are not in the same spatial
reference system as that of the service; AMENDED if the value of
dwc:countryCode was unambiguously inferred from supplied
dwc:decimalLatitude, dwc:decimalLongitude and dwc:coordinatePrecision
falling within the boundary defined by the combination of terrestrial
and exclusive economic zone; otherwise NOT_CHANGED

As defined, existing values for country code will be overwritten. Is
this desired? If not, this should probably be NOT_CHANGED.

EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority service was not available; INTERNAL_PREREQUESITES_NOT_MET if dwc:decimalLatitude, dwc:decimalLongitude are EMPTY or dwc:decimalLatitude and dwc:decimalLongitude passed to the dwc:countryCode determination service are not in the same spatial reference system as that of the service; AMENDED if the value of dwc:countryCode was unambiguously inferred from supplied dwc:decimalLatitude, dwc:decimalLongitude and dwc:coordinatePrecision falling within the boundary defined by the combination of terrestrial and exclusive economic zone and dwc:countryCode is EMPTY; otherwise NOT_CHANGED ??

#106
AMENDMENT_IDENTIFICATIONQUALIFIER_FROM_TAXON
EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority
service was not available; INTERNAL_PREREQUISITES NOT_MET if all of
the taxon name fields were EMPTY or the field
dwc:identificationQualifier was not EMPTY; AMENDED if the field
dwc:identificationQualifier was FILLED_IN from any of the fields
dwc:scientificName, dwc:specificEpithet or dwc:infraspecificEpithet;
otherwise NOT_CHANGED

As defined, identificationQualifier will not be changed if it contains
an existing value and INTERNAL_PREREQUISITES NOT_MET will be returned.
This should probably be NOT_CHANGED.

EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority service was not available; INTERNAL_PREREQUISITES NOT_MET if all of the taxon name fields were EMPTY; AMENDED if dwc:identificationQualifier was FILLED_IN from any of dwc:scientificName, dwc:specificEpithet or dwc:infraspecificEpithet, and dwc:identificationQualifier was EMPTY; otherwise NOT_CHANGED ??

NOTE: This raises another inconsistency...the use of "the field/s dwc:xxx". I'd prefer to remove such phrases and simply use "dwc:xxx" ?? This is reflected in my version of the Expected Responses

#68
AMENDMENT_MINELEVATION-MAXELEVATION_FROM_VERBATIM
INTERNAL_PREREQUESITES_NOT_MET if the field dwc:verbatimElevation is
EMPTY or not unambiguously interpretable or
dwc:minimumElevationInMeters and/or dwc:maximumElevationInMeters are
not EMPTY; AMENDED if the fields dwc:minimumElevationInMeters and/or
dwc:maximumElevationInMeters were unambiguously interpreted from
dwc:verbatimElevation; otherwise NOT_CHANGED

As defined, elevation terms will not be filled in if populated, (but
and/or is ambiguous for implementors if one is populated and the other
not) and INTERNAL_PREREQUISITES NOT_MET will be returned. This should
probably be NOT_CHANGED, and expectations when one term is popluated
and the other not clarified. Compare with #55 below, where both terms
must be populated to avoid amendment, but one empty term can be amended.

INTERNAL_PREREQUESITES_NOT_MET if dwc:verbatimElevation is EMPTY or not unambiguously interpretable; AMENDED if dwc:minimumElevationInMeters and/or dwc:maximumElevationInMeters were unambiguously interpreted from dwc:verbatimElevation and dwc:minimumElevationInMeters and wc:maximumElevationInMeters are EMPTY; otherwise NOT_CHANGED ??

#52
AMENDMENT_EVENT_FROM_EVENTDATE
INTERNAL_PREREQUESITES_NOT_MET if
the field dwc:eventDate is EMPTY or does not contain a valid ISO
8601-1:2019 date; AMENDED if one or more EMPTY terms of the dwc:Event
class (dwc:year, dwc:month, dwc:day, dwc:startDayOfYear,
dwc:endDayOfYear) have been filled in from a valid unambiguously
interpretable value in dwc:eventDate; otherwise NOT_CHANGED

As defined, only empty terms are populated, would result in NOT_CHANGED
if all Event terms contain existing values. This is probably the
desired behaivor.

Agreed, but the phrasing is a tad unusual.

#55
AMENDMENT_MINDEPTH-MAXDEPTH_FROM_VERBATIM
INTERNAL_PREREQUESITES_NOT_MET if the field dwc:verbatimDepth is EMPTY
or not unambiguously interpretable or dwc:minimumDepthInMeters and
dwc:maximumDepthInMeters are not EMPTY; AMENDED if the fields
dwc:minimumDepthInMeters and/or dwc:maximumDepthInMeters were
unambiguously determined from dwc:verbatimDepth; otherwise NOT_CHANGED

As defined returns INTERNAL_PREREQUESITES_NOT_MET if both depth terms
are populated. This should probably be NOT_CHANGED.

INTERNAL_PREREQUESITES_NOT_MET if dwc:verbatimDepth is EMPTY or not unambiguously interpretable or dwc:minimumDepthInMeters; AMENDED if dwc:minimumDepthInMeters and/or dwc:maximumDepthInMeters were unambiguously determined from dwc:verbatimDepth and dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are EMPTY; otherwise NOT_CHANGED ??

#132
AMENDMENT_EVENTDATE_FROM_YEARSTARTDAYOFYEARENDDAYOFYEAR
INTERNAL_PREREQUISITES_NOT_MET if the field dwc:eventDate was not EMPTY
or dwc:year was EMPTY or both dwc:startDayOfYear and dwc:endDayOfYear
were EMPTY or not interpretable; AMENDED if the value of dwc:eventDate
was FILLED_IN from the values in dwc:year, dwc:startDayOfYear and
dwc:endDayOfYear; otherwise NOT_CHANGED

As defined, returns INTERNAL_PREREQUESITES_NOT_MET if eventDate is
populated. This should probably be NOT_CHANGED.

INTERNAL_PREREQUISITES_NOT_MET if dwc:year was EMPTY or both dwc:startDayOfYear and dwc:endDayOfYear were EMPTY or not interpretable; AMENDED if the value of dwc:eventDate
was FILLED_IN from the values in dwc:year, dwc:startDayOfYear and dwc:endDayOfYear, and dwc:eventDate was EMPTY; otherwise NOT_CHANGED ??

#102
AMENDMENT_GEODETICDATUM_ASSUMEDDEFAULT
EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority
service was not available; INTERNAL_PREREQUESITES_NOT_MET if the field
dwc:geodeticDatum was an interpretable value or the predefined
parameter is not set; AMENDED if the field dwc:geodeticDatum was EMPTY
or was uninterpretable, the value of dwc:geodeticDatum was set to a
predefined default value; otherwise NOT_CHANGED

Phrasing for AMENDED is unclear, the comma probably needs to be removed.

As defined, will result in NOT_CHANGED if geodeticDatum is populated
with an uninterpretable value, but INTERNAL_PREREQUESITES_NOT_MET if
geodeticDatum contains an interpretable value. This sounds like it
will confuse consumers of the results.

EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority service was not available; INTERNAL_PREREQUESITES_NOT_MET if the predefined parameter is not set; AMENDED if the field dwc:geodeticDatum was EMPTY; otherwise NOT_CHANGED ??

#93
AMENDMENT_EVENTDATE_FROM_YEARMONTHDAY
INTERNAL_PREREQUISITES_NOT_MET if dwc:year is EMPTY or is uninterpretable as a valid year; AMENDED if
the value of dwc:eventDate was interpreted from the values in dwc:year,
dwc:month and dwc:day; otherwise NOT_CHANGED

As defined, returns INTERNAL_PREREQUESITES_NOT_MET if eventDate is
populated. This should probably be NOT_CHANGED.

INTERNAL_PREREQUISITES_NOT_MET if dwc:year is EMPTY or is uninterpretable as a valid year; AMENDED if the value of dwc:eventDate was interpreted from the values in dwc:year, dwc:month and dwc:day and dwc:eventDate is EMPTY; otherwise NOT_CHANGED ??

#86
AMENDMENT_EVENTDATE_FROM_VERBATIM
INTERNAL_PREREQUISITES_NOT_MET if the field dwc:eventDate is not EMPTY
or the field dwc:verbatimEventDate is EMPTY or not unambiguously
interpretable as an ISO 8601-1:2019 date; AMENDED if the value of
dwc:eventDate was unambiguously interpreted from dwc:verbatimEventDate;
otherwise NOT_CHANGED

As defined, returns INTERNAL_PREREQUESITES_NOT_MET if eventDate is
populated. This should probably be NOT_CHANGED.

INTERNAL_PREREQUISITES_NOT_MET if dwc:verbatimEventDate is EMPTY or not unambiguously interpretable as an ISO 8601-1:2019 date; AMENDED if the value of dwc:eventDate was unambiguously interpreted from dwc:verbatimEventDate and dwc:eventDate is EMPTY; otherwise NOT_CHANGED ??

#71
AMENDMENT_SCIENTIFICNAME_FROM_TAXONID
EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority
service was not available; INTERNAL_PREREQUISITES_NOT_MET if the field
dwc:taxonID is EMPTY or the value of dwc:taxonID is ambiguous; AMENDED
if the field dwc:scientificName was added by the specified source
authority service resolving the dwc:taxonID value; otherwise NOT_CHANGED

As defined, likely to be implemented as changing existing values of
scientificName, but could also be implemented as only filling in when
scientificName is empty returning NOT_CHANGED if populated. Needs
clearer specification than "scientificName was added", where "added"
creates ambiguity about intention.

EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority service was not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY or the value of dwc:taxonID is ambiguous; AMENDED if dwc:taxonID was resolvable and dwc:scientificName is EMPTY; otherwise NOT_CHANGED ??


Tests that follow appear to change existing values by design.

I have not removed instances of " the field/s..." below

#43
AMENDMENT_COORDINATES_CONVERTED
INTERNAL_PREREQUESITES_NOT_MET if the fields dwc:decimalLatitude and
dwc:decimalLongitude were EMPTY or the field dwc:geodeticDatum, if it
exists in the record, was not interpretable; AMENDED if
dwc:decimalLatitude, dwc:decimalLongitude, and dwc:geodeticDatum were
changed based on a conversion between spatial reference systems;
otherwise NOT_CHANGED

As defined, existing values will be changed, this appears to be desired
behavior.

Agreed.

#45
AMENDMENT_POLYNOMIAL_STANDARDIZED
EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority
service was not available; INTERNAL_PREREQUESITES_NOT_MET if the field
dwc:scientificName is EMPTY; AMENDED if nomenclatural errors
(typographical errors and misspellings) represented in
dwc:scientificName have been unambiguously interpreted given the
specified source authority service; otherwise NOT_CHANGED

As defined, existing values will be changed, this appears to be desired
behavior.

Agreed

#118
AMENDMENT_GEOGRAPHY_STANDARDIZED
EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority
service was not available or if the combination of administrative
geography terms could not be unambiguously resolved from the specified
source authority service; AMENDED if one or more of the administrative
geographic terms (dwc:continent, dwc:country, dwc:countryCode,
dwc:stateProvince, dwc:county, dwc:municipality) was changed to comply
with standard values from the specified source authority service;
otherwise NOT_CHANGED

As defined, existing values for country code will be overwritten. This
appears to be desired behavior.

Agreed.

#54
AMENDMENT_COORDINATES_TRANSPOSED
INTERNAL_PREREQUESITES_NOT_MET
if the fields dwc:decimalLatitude and dwc:decimalLongitude are EMPTY or
dwc:decimalLatitude and dwc:decimalLongitude values passed to the
dwc:countryCode specified source authority service were not in the same
spatial reference system as that of the service; AMENDED if the
supplied geographic coordinates were transposed or one or more of the
signs were reversed (negated) to place the record in the region defined
by the supplied dwc:countryCode; otherwise NOT_CHANGED

As defined, will change existing decimalLat/Long values. This appears
to be the desired behavior.

Agreed.

@Tasilee Tasilee changed the title INTERNAL PREREQUISITES NOT MET vs NOT CHANGED TG2-INTERNAL PREREQUISITES NOT MET vs NOT CHANGED Sep 2, 2019
@ArthurChapman
Copy link
Collaborator

It will take me some time to consider all these. One thing I have noted, however is our tenses change - for example we say ..... were .... changed and ..... are empty .... surely we would be better saying they were empty and were thus changed. I also think we should put the wording around the other way in #68 for example

From
INTERNAL_PREREQUESITES_NOT_MET if dwc:verbatimElevation is EMPTY or not unambiguously interpretable; AMENDED if dwc:minimumElevationInMeters and/or dwc:maximumElevationInMeters were unambiguously interpreted from dwc:verbatimElevation and dwc:minimumElevationInMeters and wc:maximumElevationInMeters are EMPTY; otherwise NOT_CHANGED ??

INTERNAL_PREREQUESITES_NOT_MET if dwc:verbatimElevation is EMPTY or not unambiguously interpretable; AMENDED if dwc:minimumElevationInMeters and dwc:maximumElevationInMeters were EMPTY, and dwc:minimumElevationInMeters and/or
dwc:maximumElevationInMeters were unambiguously interpretable from
dwc:verbatimElevation; otherwise NOT_CHANGED ??

@ArthurChapman
Copy link
Collaborator

  1. In a lot of these, I would prefer to see the "are EMPTY" etc. mentioned at the start of the AMENDED rather than at the end. My reasoning is that that is the first thing to be checked in the programming - if it is NOT_EMPTY, we do not AMEND - if it is EMPTY, then we move to the next stage. I won't comment on each individually - just a genereal policy

  2. We need to get the tesnse consistent (are/is/was/were etc.)

  3. I agree with dropping "terms" "fields" in the dwc:xxxx examples.

  4. TG2-AMENDMENT_EVENT_FROM_EVENTDATE #52

From the tests we have run so far, there is often an increase in nonStandard values in Day, Year, etc. If the eventDate is a valid ISO date, these should provide valid numbers for Year, Month, Day. I guess errors could come in in relation to earliest and latest dates if we have them Paramaterized.

  1. TG2-AMENDMENT_EVENTDATE_FROM_YEARSTARTDAYOFYEARENDDAYOFYEAR #132
    So far the testing has produced very few changes (1 in 20 Million). Thus this test is probably not Core - however, we should continue to monitor as a different data set may produce different results.

  2. TG2-AMENDMENT_GEODETICDATUM_ASSUMEDDEFAULT #102
    As mentioned under TG2-AMENDMENT_GEODETICDATUM_ASSUMEDDEFAULT #102, an earlier version included "therefore" following the comma.

  3. Agree with changing from PREREQUISITES_NOT_MET to NOT_CHANGED as suggested (e.g. TG2-AMENDMENT_EVENTDATE_FROM_YEARMONTHDAY #93, TG2-AMENDMENT_EVENTDATE_FROM_VERBATIM #86, etc.)

  4. We should become consistent in our use of "ADDED" and "FILLED_IN" (e.g. TG2-AMENDMENT_SCIENTIFICNAME_FROM_SCIENTIFICNAMEID #71 and others)

  5. Just a principle on TG2-AMENDMENT_COORDINATES_CONVERTED #43, TG2-AMENDMENT_POLYNOMIAL_STANDARDIZED #45, etc. We do CHANGE a value - but NB at the same time the old value should be retained in a xxxxxx_OLD or something similar. I guess we are assuming where we say CHANGED that we are doing this under the principle that nothing is overwritten - but we don't state that with each of those tests?????

@Tasilee
Copy link
Collaborator Author

Tasilee commented Sep 4, 2019

Thanks @ArthurChapman. Good work.

  1. I agree with the reference to EMPTY early.

  2. Fully agree about tense. I picked up on that as well going through @chicoreus email post on this. I'd suggest "is" and "are" (as present tense at the time fo the test)

  3. Remove references to "fields ..." - yep

  4. TG2-AMENDMENT_EVENT_FROM_EVENTDATE #52 - Can we see some examples where this occurs?

  5. TG2-AMENDMENT_EVENTDATE_FROM_YEARSTARTDAYOFYEARENDDAYOFYEAR #132 - candidate for not CORE but agree, another dataset/sample would be good

  6. TG2-AMENDMENT_GEODETICDATUM_ASSUMEDDEFAULT #102 - Edited as noted earlier

  7. Yep

  8. I agree. I'd prefer "FILLEDIN" or "FILLED-IN"

  9. Yes, the principle of not overwriting values presumes in cases where we do, that the original value is retained. I think the ALA uses "RAW-"

I agree. We will need another full pass over all the Expected Responses

@tucotuco
Copy link
Member

tucotuco commented Sep 5, 2019

I concur with all 9 conclusions.

@chicoreus
Copy link
Collaborator

Discussion in call, 2022 Feb 27, for consistency. standardize on AMENDED/NOT_AMENDED, using NOT_AMENDED instead of NO_CHANGE or NOT_CHANGED. Also conclusion over time has been to use AMENDED instead of the various subtypes FILLED_IN, TRANSPOSED, and NOT_AMENDED instead of AMBIGUOUS (this one in particular being somewhat orthogonal, as an amendment could be proposed in the face of ambiguity or not. General conclusion is that these are best left to extensions of a data quality response, or to the response.comment.

@Tasilee
Copy link
Collaborator Author

Tasilee commented Feb 28, 2022

I'll start working through the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants