TG2 - Test Data Framework #189

Tasilee · 2020-09-30T20:47:44Z

A Zoom discussion September 29/30 recommended that we develop unit tests for each of the VALIDATIONs. The main justifications (thanks to @tucotuco) were extensibility and minimal maintenance considering the evolution of the Darwin Core standard on which the TG2 tests are based.

We have 65 VALIDATIONs and would value any assistance in the creation the unit tests based on what @chicoreus has proposed with the following template using #187 as an example.

Test VALIDATION_MAXDEPTH_OUTOFRANGE
GUID 3f1db29a-bfa5-40db-9fd1-fde020d81939
Column 1 is the INPUT (one column for each InformationElement in the test)
Columns 2-3 are the parameter values (one column for each Parameter in the test)
Columns 4-6 are the expected output, values in columns 4 and 5 must match exactly.
Column 7 is a remark on the row in this table, not part of the expected output.

See https://github.com/tdwg/bdq/blob/master/tg2/core/testdata/testdata_VALIDATION_MAXDEPTH_OUTOFRANGE_%23187.csv for the latest version of this file.

dwc:maximumDepthInMeters	bdq:minimumValidDepthInMeters	bdq:maximumValidDepthInMeters	Response.Status	Response.Result	Response.Comment	Remark
100	0	11000	RUN_HAS_RESULT	COMPLIANT	[any human readable explanation, e.g. 100 is in the range 0 to 11000]
100			RUN_HAS_RESULT	COMPLIANT	[any human readable explanation, e.g. 100 is in the default range 0 to 11000]
200000			RUN_HAS_RESULT	NOT_COMPLIANT	[any human readable explanation, e.g. 200000 is outside the range 0 to 11000]
0.4			RUN_HAS_RESULT	COMPLIANT	[any human readable explanation, e.g. 0.4 is in the range 0 to 11000]
0			RUN_HAS_RESULT	COMPLIANT	[any human readable explanation, e.g. 0 is in the range 0 to 11000]
11000			RUN_HAS_RESULT	COMPLIANT	[any human readable explanation, e.g. 11000 is in the range 0 to 1100]
thirty			INTERNAL_PREREQUISITES_NOT_MET		[any human readable explanation, e.g. provided value must be a number to be validated]
			INTERNAL_PREREQUISITES_NOT_MET		[any human readable explanation, e.g. a value must be provided to be validated]
null			INTERNAL_PREREQUISITES_NOT_MET		[any human readable explanation, e.g. provided value must be a number to be validated]
-145.3			RUN_HAS_RESULT	COMPLIANT	[any human readable explanation, e.g. provided value is outside the range 0 to 11000]
1000	10	100	RUN_HAS_RESULT	NOT_COMPLIANT	[any human readable explanation, e.g. 1000 is outside the provided parameter range 10 to 100]	[Note, non-default parameters should carry through to the Response.Comment]
[no depth specified]			INTERNAL_PREREQUISITES_NOT_MET		[any human readable explanation, e.g. provided value must be a number to be validated]
115,2			RUN_HAS_RESULT	COMPLIANT	[any human readable explanation, e.g. 115,2 is in the range 0 to 11000 where both . and , are recognized as decimal separators]
115.2			RUN_HAS_RESULT	COMPLIANT	[any human readable explanation, e.g. 115.2 is in the range 0 to 11000 where both . and , are recognized as decimal separators]
1,828.8			INTERNAL_PREREQUISITES_NOT_MET		[any human readable explanation, e.g. comma not recognized as a place separator, provided value must be a number]	[This case needs discussion, it is a plausible input value ]
1 828.8			RUN_HAS_RESULT	COMPLIANT	[any human readable explanation, e.g. space recognized as a place separator, provided value is in range 0 to 11000]	[This case needs discussion, this is an implausible value but fits SI expectations]
`1,828.8`			RUN_HAS_RESULT	COMPLIANT	[any human readable explanation, e.g. leading and trailing spaced should be trimmed, provided value is in range 0 to 11000]	[Note: The input is the string " 1,828.8 " with leading and trailing spaces, but without the quotation marks]
`1828.8`			RUN_HAS_RESULT	COMPLIANT	[any human readable explanation, e.g. leading and trailing spaced should be trimmed, provided value is in range 0 to 11000]	[Note: The input is the string " 1828.8 " with leading and trailing spaces, but without the quotation marks]
-354			RUN_HAS_RESULT	NOT_COMPLIANT	[any human readable explanation, e.g., the value is a negative number and is therefore outside the permissible range]

The text was updated successfully, but these errors were encountered:

Tasilee · 2020-09-30T21:09:49Z

From @chicoreus

All inputs are assumed to be of type string, and it is the responsibility of the test suite to convert them to appropriate other types when needed (integers, floating point values).
It is the responsibility of the test suite to trim leading and trailing whitespace from each input.

Questions

For non-integer numbers, do we specify, as SI, either comma or period as the decimal separators (thus 146.5 and 146,5 are treated as the same number)? (I think yes).
For numbers, do we specify, as SI, that only a space may be used to separate every three places in a number, or are we mute on this (e.g. treating "1,000.4" as not a number, treating "1 000.4" as a number 1000.4, and treating "1,000" as a number 1.000 (one, not one thousand)), or do we not specify, and leave handling of this to the implementation language's number parser (e.g. java's Integer.parseInt(String s) or Float.parseFloat(String s). (I'm not sure).

ArthurChapman · 2020-09-30T22:15:19Z

Question 1 - definitely YES (Pity the world doesn't have one standard for this!)

Question 2 - is there an ISO standard or some other standard we can cite for this?

Tasilee · 2020-09-30T22:56:46Z

These are the VALIDATIONs ordered by Darwin Core Term

LInk	Dimension	Term_Action	Lee	Arthur	Paul
#58	Other	BASISOFRECORD_EMPTY	X	X
#104	Other	BASISOFRECORD_NOTSTANDARD	X	X
#77	Name	CLASS_NOTFOUND	X	X
#123	Name	CLASSIFICATION_AMBIGUOUS	X	X
#50	Space	COORDINATES_COUNTRYCODE_INCONSISTENT
#56	Space	COORDINATES_STATE-PROVINCE_INCONSISTENT
#51	Space	COORDINATES_TERRESTRIALMARINE
#87	Space	COORDINATES_ZERO
#109	Space	COORDINATEUNCERTAINTY_OUTOFRANGE
#62	Space	COUNTRY_COUNTRYCODE_INCONSISTENT
#42	Space	COUNTRY_EMPTY	X	X
#21	Space	COUNTRY_NOTSTANDARD
#98	Space	COUNTRYCODE_EMPTY	X	X
#20	Space	COUNTRYCODE_NOTSTANDARD
#69	Time	DATEIDENTIFIED_NOTSTANDARD
#76	Time	DATEIDENTIFIED_OUTOFRANGE
#147	Time	DAY_NOTSTANDARD	X	X	X
#125	Time	DAY_OUTOFRANGE
#103	Other	DCTYPE_EMPTY	X	X
#91	Other	DCTYPE_NOTSTANDARD	X	X
#119	Space	DECIMALLATITUDE_EMPTY	X	X
#79	Space	DECIMALLATITUDE_OUTOFRANGE
#96	Space	DECIMALLONGITUDE_EMPTY	X	X
#30	Space	DECIMALLONGITUDE_OUTOFRANGE
#131	Time	ENDDAYOFYEAR_OUTOFRANGE
#88	Time	EVENT_TEMPORAL_EMPTY
#33	Time	EVENTDATE_EMPTY
#67	Time	EVENTDATE_INCONSISTENT
#66	Time	EVENTDATE_NOTSTANDARD
#36	Time	EVENTDATE_OUTOFRANGE
#28	Name	FAMILY_NOTFOUND	X	X
#122	Name	GENUS_NOTFOUND	X	X
#78	Space	GEODETICDATUM_EMPTY	X	X
#59	Space	GEODETICDATUM_NOTSTANDARD
#95	Space	GEOGRAPHY_AMBIGUOUS
#139	Space	GEOGRAPHY_NOTSTANDARD
#81	Name	KINGDOM_NOTFOUND	X	X
#99	Other	LICENSE_EMPTY	X	X
#38	Other	LICENSE_NOTSTANDARD	X	X
#40	Space	LOCATION_EMPTY	X	X
#187	Space	MAXDEPTH_OUTOFRANGE	X	X	X
#112	Space	MAXELEVATION_OUTOFRANGE
#24	Space	MINDEPTH_GREATERTHAN_MAXDEPTH
#107	Space	MINDEPTH_OUTOFRANGE	X	X
#108	Space	MINELEVATION_GREATERTHAN_MAXELEVATION
#39	Space	MINELEVATION_OUTOFRANGE
#126	Time	MONTH_NOTSTANDARD	X	X	X
#47	Other	OCCURRENCEID_EMPTY	X	X
#23	Other	OCCURRENCEID_NOTSTANDARD	X	X
#117	Other	OCCURRENCESTATUS_EMPTY	X	X
#116	Other	OCCURRENCESTATUS_NOTSTANDARD	X	X
#83	Name	ORDER_NOTFOUND	X	X
#22	Name	PHYLUM_NOTFOUND	X	X
#101	Name	POLYNOMIAL_INCONSISTENT	X	X
#82	Name	SCIENTIFICNAME_EMPTY	X	X
#46	Name	SCIENTIFICNAME_NOTFOUND	X	X
#130	Time	STARTDAYOFYEAR_OUTOFRANGE
#70	Name	TAXON_AMBIGUOUS	X	X
#105	Name	TAXON_EMPTY	X	X
#121	Name	TAXONID_AMBIGUOUS	X	X
#120	Name	TAXONID_EMPTY	X	X
#161	Name	TAXONRANK_EMPTY	X	X
#162	Name	TAXONRANK_NOTSTANDARD	X	X
#49	Time	YEAR_EMPTY	X	X	X
#84	Time	YEAR_OUTOFRANGE	X	X	X

#29	Other	ANNOTATION_NOTEMPTY	X	X
#72	All	DATAGENERALIZATIONS_NOTEMPTY	X	X
#94	Other	ESTABLISHMENTMEANS_NOTEMPTY	X	X

Can I suggest @tucotuco makes a start on the SPACE ones, @ArthurChapman on the NAME ones, @chicoreus on the TIME ones and @Tasilee on OTHER and NOTIFICATIONS? Hopefully a few others will offer some help, at least for checking.

chicoreus · 2020-09-30T23:49:49Z

I've updated the table slightly, changing 143.5 to a negative value so that the not-compliant result makes sense, adding a remarks column with notes about the tests, and making more explicit the two tests at the end which have leading and trailing space characters as part of the test value. I've also clarified the explanatory text at the top of the table and added examples of human readable explanations where they were absent.

chicoreus · 2020-09-30T23:58:26Z

@ArthurChapman for (2) is "1 828.8" [without the quotes, 1000 fathoms, in meters, with a period as the decimal separator and a space separating every three digits (an expected SI format for publication, a very unnatural form for electronic darwin core data, where "1828.8" or "1828,8" serialized from some floating point representation by software into some form of data exchange document would be expected values (with localization of the software doing the serialization making the choice about comma or period as the decimal separator, but most software not adding space separators every three digits in serialized data). My tendency would be to say, we can expect to see "1828.8" or "1828,8" in abundance in the wild, but not "1,828.8" or "1 828,8", and should either not specify how these cases should be handled, or should say that both are expected to be internal prerequisites not met as a general expectation for all darwin core data. For standards, we should probably look for RFCs for serialization of numeric data, rather than ISO or SI representation, as the (numeric or date) values found in data sets will in large part be serializations into uncontrolled string fields of database representations of strongly typed database fields (or less strongly typed and variously formatted spreadsheet columns...).

…a for #187. Filename suggests pattern, testdata_{humanreadablenameoftest}.csv for such test data sets

ArthurChapman · 2020-10-01T06:17:27Z

Thanks a million Lee- give me the easy one :-)

ArthurChapman · 2020-10-01T06:38:06Z

Thanks @chicoreus I agree with what you suggest, although - certainly in Australia - I think "1,828.8" would be common but happy to have it treated as you suggest.

BTW - what is the easiest way to open that file as an Excel file - can you send it to me separately as just csv. Copy and pasting doesn't seem to work.

tucotuco · 2020-10-01T12:06:19Z

I accept working on the SPACE test data.

…

On Wed, Sep 30, 2020 at 7:57 PM Lee Belbin ***@***.***> wrote: These are the VALIDATIONs ordered by Darwin Core Term *Dimension* *Term_Action* Other BASISOFRECORD_EMPTY Other BASISOFRECORD_NOTSTANDARD Name CLASS_NOTFOUND Name CLASSIFICATION_AMBIGUOUS Space COORDINATES_COUNTRYCODE_INCONSISTENT Space COORDINATES_STATE-PROVINCE_INCONSISTENT Space COORDINATES_TERRESTRIALMARINE Space COORDINATES_ZERO Space COORDINATEUNCERTAINTY_OUTOFRANGE Space COUNTRY_COUNTRYCODE_INCONSISTENT Space COUNTRY_EMPTY Space COUNTRY_NOTSTANDARD Space COUNTRYCODE_EMPTY Space COUNTRYCODE_NOTSTANDARD Time DATEIDENTIFIED_NOTSTANDARD Time DATEIDENTIFIED_OUTOFRANGE Time DAY_NOTSTANDARD Time DAY_OUTOFRANGE Other DCTYPE_EMPTY Other DCTYPE_NOTSTANDARD Space DECIMALLATITUDE_EMPTY Space DECIMALLATITUDE_OUTOFRANGE Space DECIMALLONGITUDE_EMPTY Space DECIMALLONGITUDE_OUTOFRANGE Time ENDDAYOFYEAR_OUTOFRANGE Time EVENT_TEMPORAL_EMPTY Time EVENTDATE_EMPTY Time EVENTDATE_INCONSISTENT Time EVENTDATE_NOTSTANDARD Time EVENTDATE_OUTOFRANGE Name FAMILY_NOTFOUND Name GENUS_NOTFOUND Space GEODETICDATUM_EMPTY Space GEODETICDATUM_NOTSTANDARD Space GEOGRAPHY_AMBIGUOUS Space GEOGRAPHY_NOTSTANDARD Name KINGDOM_NOTFOUND Other LICENSE_EMPTY Other LICENSE_NOTSTANDARD Space LOCATION_EMPTY Space MAXDEPTH_OUTOFRANGE Space MAXELEVATION_OUTOFRANGE Space MINDEPTH_GREATERTHAN_MAXDEPTH Space MINDEPTH_OUTOFRANGE Space MINELEVATION_GREATERTHAN_MAXELEVATION Space MINELEVATION_OUTOFRANGE Time MONTH_NOTSTANDARD Other OCCURRENCEID_EMPTY Other OCCURRENCEID_NOTSTANDARD Other OCCURRENCESTATUS_EMPTY Other OCCURRENCESTATUS_NOTSTANDARD Name ORDER_NOTFOUND Name PHYLUM_NOTFOUND Name POLYNOMIAL_INCONSISTENT Name SCIENTIFICNAME_EMPTY Name SCIENTIFICNAME_NOTFOUND Time STARTDAYOFYEAR_OUTOFRANGE Name TAXON_AMBIGUOUS Name TAXON_EMPTY Name TAXONID_AMBIGUOUS Name TAXONID_EMPTY Name TAXONRANK_EMPTY Name TAXONRANK_NOTSTANDARD Time YEAR_EMPTY Time YEAR_OUTOFRANGE Can I suggest @tucotuco <https://github.com/tucotuco> makes a start on the SPACE ones, @ArthurChapman <https://github.com/ArthurChapman> on the NAME ones, @chicoreus <https://github.com/chicoreus> on the TIME ones and @Tasilee <https://github.com/Tasilee> on OTHER? Hopefully a few others will offer some help. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#189 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADQ72ZBAO5A5V4PPRZDY3TSIOZTXANCNFSM4R7SE4JA> .

…est data files for issue #189.

chicoreus · 2020-10-01T19:35:46Z

Have data for the time tests in progress, will accept working on the rest of the time test data.

…ata (#189) for test #187.

…name consistent with case of test label.

chicoreus · 2020-10-01T20:00:17Z

@ArthurChapman best way to obtain the csv files is with the raw link. For example, for https://github.com/tdwg/bdq/blob/master/tg2/core/testdata/testdata_VALIDATION_MAXDEPTH_OUTOFRANGE.csv to the upper right of the table are the buttons Raw and Blame. Raw takes you to the raw csv file https://raw.githubusercontent.com/tdwg/bdq/master/tg2/core/testdata/testdata_VALIDATION_MAXDEPTH_OUTOFRANGE.csv - which is important in these cases, as the data values may be numbers not in quotes, or numbers in quotes as strings with whitespace padding.

I've added the maximum 32 bit signed and 32 bit unsigned integer values, plus those values with 1 added and those values with 2 added, plus the name of the term under test (e.g. dwc:day="day") to each of the three sets of test data I've got up so far. -1, 0, the maximum integer values are good test values to add for any term that takes numeric data.

ArthurChapman · 2020-10-01T22:27:24Z

@chicoreus You have an error in testdata_VALIDATION_MAXDEPTH_OUTOFRANGE.csv - in lines 19 and 20 for the default for bdq:minimumValidDepthInMeters - Depth can never be a negative number. So 18 has to be NOT_COMPLIANT

Also lines 23 and 24 appear identical

ArthurChapman · 2020-10-01T22:34:46Z

@chicoreus in testdata_VALIDATION_DAY_NOTSTANDARD.csv Lines appear to be duplicates

ArthurChapman · 2020-10-01T22:38:00Z

@chicoreus in testdata_VALIDATION_MONTH_NOTSTANDARD.csv

Lines 4 and 5 appear to be duplicates

Lines 39, 40, 41 should be NOT_COMPLIANT

Should we include "01" etc.

chicoreus · 2020-10-01T23:18:11Z

@ArthurChapman testdata_VALIDATION_MAXDEPTH_OUTOFRANGE.csv , lines 19 and 20 are both correct. They are testing cases where the provided parameter values are outside the defaults, thus does the test listen to the provided parameters or does it treat the defaults as hard limits.

For testdata_VALIDATION_DAY_NOTSTANDARD.csv, check the raw csv file, the duplicated lines are probably cases where leading or trailing spaces are present in one line but not another.

for testdata_VALIDATION_MONTH_NOTSTANDARD.csv lines 4 and 4 differ in whitespace in the input, line 4 has the string "1", line 5 the string " 1" with a leading space. Lines 39-41 are indeed in error.

Yes, leading zeros make sense to test. I have added.

…ng leading zeros to tests. Fixing NOT_COMPLIANT out of range month 13.

ArthurChapman · 2020-10-01T23:55:48Z

Thanks @chicoreus. I still think it is misleading for the default depth to be a negative number as that is not allowed

From Georeferencing Best Practices

DEPTH "A measurement of the vertical distance below a vertical datum. In this document, we try to modify the term to signify the medium in which the measurement is made. Thus, "water depth" is the vertical distance below an air-water interface in a waterbody (ocean, lake, river, sinkhole, etc.). Compare distance above surface. Depth is always a non-negative number."

chicoreus · 2020-10-02T02:49:23Z

If depth is distance from a vertical datum, and dept represents a vertical distance below an air-water interface, then negative values of depth are possible. Consider a vertical datum of mean sea level, and a sample collected in the intertidal, below the surface of the water at a high tide, above the mean sea level vertical datum. Such a sample would be both collected below the air-water interface, and at a distance above (thus negative) from the vertical datum. If, however, depth can never be a negative value, then we need to be explicit about that in the specification for VALIDATION_MAXDEPTH_OUTOFRANGE and other depth related tests such that the test is explicit about regardless of the parameterization, zero is the smallest allowed value for depth, and even if a negative value is provided as a parameter, the test must still return not compliant for depths smaller than zero.

ArthurChapman · 2020-10-02T05:00:28Z

What you are describing Paul is

distance above surface
In addition to elevation and depth, a measurement of the vertical distance above a reference point, with a minimum and a maximum distance to cover a range. For surface terrestrial locations, the reference point should be the elevation at ground level. Over a body of water (ocean, sea, lake, river, glacier, etc.), the reference point for aerial locations should be the elevation of the air-water interface, while the reference point for sub-surface benthic locations should be the interface between the water and the substrate. Locations within a water body should use depth rather than a negative distance above surface. Distances above a reference point should be expressed as positive numbers, while those below should be negative. The maximum distance above a surface will always be a number greater than or equal to the minimum distance above the surface. Since distances below a surface are negative numbers, the maximum distance will always be a number less than or equal to the minimum distance. Compare altitude.

…finition of EMPTY() in #111 and #152.

tucotuco · 2020-10-02T15:14:14Z

I think we should add Paul's example to the Best Practices and explain how it should be determined? 1 m below the surface of the ocean stuck to a rock at a 2 m high tide. Elevation: 2 m Vertical Datum: EGM1996 Depth: 1 m Distance above surface: 0 m

…

On Fri, Oct 2, 2020 at 2:00 AM Arthur Chapman ***@***.***> wrote: What you are describing Paul is *distance above surface* In addition to elevation and depth, a measurement of the vertical distance above a reference point, with a minimum and a maximum distance to cover a range. For surface terrestrial locations, the reference point should be the elevation at ground level. Over a body of water (ocean, sea, lake, river, glacier, etc.), the reference point for aerial locations should be the elevation of the air-water interface, while the reference point for sub-surface benthic locations should be the interface between the water and the substrate. Locations within a water body should use depth rather than a negative distance above surface. Distances above a reference point should be expressed as positive numbers, while those below should be negative. The maximum distance above a surface will always be a number greater than or equal to the minimum distance above the surface. Since distances below a surface are negative numbers, the maximum distance will always be a number less than or equal to the minimum distance. Compare altitude. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#189 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADQ725IRC7LA63ALH5V7ATSIVM7TANCNFSM4R7SE4JA> .

…ble comment in one line and adding a file with examples of non-printing characters (unicode u0000,u0007,and u0020, for discussion of for definition of EMPTY() in #111 and #152.

…tion, updating status and comment for negative values in test data for #187.

chicoreus · 2020-10-02T16:09:57Z

@tucotuco I'm confused. If depth is defined as distance below a vertical datum, and the data as you specify are:

1 m below the surface of the ocean stuck to a rock at a 2 m high tide.
Elevation: 2 m
Vertical Datum: EGM1996
Depth: 1 m
Distance above surface: 0 m

Doesn't this mean that the vertical datum is the datum for both elevation and depth, and the point is both 2 meters above this datum and one meter below this datum, and at the water surface all at the same time?

Shouldn't the values be:
1 m below the surface of the ocean stuck to a rock at a 2 m high tide (2 meters above local Mean Sea Level).
Elevation: 1 m
Vertical Datum: MSL
Depth: null
Distance below surface: 1 m

This tells us that the sample was collected 1 meter above mean sea level for that location, and was 1 meter below the surface of the water at that time.

For nearshore and intertidal localities, particularly with historical data, vertical position is most likely known based on a local mean low tide, mean tide, or mean high tide datum, which may or may not be translatable from the provided data to a global vertical datum.

In accord with #189 added test data file for TAXONRANK_EMPTY #161

In accord with #189 added test data file for TAXONRANK_NOTSTANDARD #162

In accordance with #189, added file testdata_NOTIFICATION_ANNOTATION_NOTEMPTY_#29.csv

In accordance with #189, added file testdata_NOTIFICATION_DATAGENERALIZATIONS_NOTEMPTY_#72 for #72

In accordance with #189, added testdata_NOTIFICATION_ESTABLISHMENTMEANS_NOTEMPTY_#94.csv for #94

In accordance with #189, added test data testdata_VALIDATION_BASISOFRECORD_EMPTY_#58.csv for #58

In accordance with #189, added testdata_VALIDATION_BASISOFRECORD_NOTSTANDARD_#104.csv for #104

In accordance with #189, added testdata_VALIDATION_DCTYPE_EMPTY_#103.csv for #103

In accordance with #189, added testdata_VALIDATION_DCTYPE_NOTSTANDARD_#91.csv for #91

In accordance with #189, added testdata_VALIDATION_LICENCE_EMPTY_#99.csv for #99

In accordance with #189, added testdata_VALIDATION_LICENSE_NOTSTANDARD_#38.csv for #38

In accordance with #189, added testdata_VALIDATION_OCCURRENCEID_EMPTY_#47.csv for #47

In accordance with #189, added testdata_VALIDATION_OCCURRENCEID_NOTSTANDARD_#23.csv for #23

In accordance with #189, added testdata_VALIDATION_OCCURRENCESTATUS_EMPTY_#117.csv for #117

IN accordance with #189, added testdata_VALIDATION_OCCURRENCESTATUS_NOTSTANDARD_#116.csv for #116

Tasilee · 2020-10-08T03:45:23Z

I had a chat with @ArthurChapman after we have discussed some of the issues arising and figure there are at least the following issues to discuss once we have all done our test data.

Response.comment: Do we use a consistent phrasing as in for example "[any human readable explanation, e.g. bdq:annotation is NOTEMPTY]"
How do we include [any non-printing characters]
Where do we need to include an explanation?
bdq:annotation or w3c:annotation or oa:annotation or ... If we need our own definition, maybe bdq:namespace / vocab entry could be useful?
Should we use "NOTIFY if ..." instead of "REPORT if ..." for the NOTIFICATIONs?
...? Please add issues when they arise.

ArthurChapman · 2020-10-08T05:37:12Z

@Tasilee added some columns to the table above for ticking off test data files that have been checked by each of us.

In accord with #189 added test data file for #42

In accord with #189 added test data file for #98

In accord with #189 added test data file for #119

In accord with #189 added test data file for test #119

In accord with #189 added tests data file for #78

In accord with #189 added test data file for #40

In accord with #189 added test data file for #107

… changing data values to be consistent with basis of record, adding explicit alternative vocabularies, clarifying human readable messages, adding column to specify source authority, adding cases for all valid vocabulary values, adding a range of cases for problematic values.

…ests for #104, fixing u0000 value in non-printing characters test for #49, all as per #189.

Tasilee · 2021-01-05T23:26:22Z

I created an Excel file (emailed) with worksheets that support one or more test templates from the test datasets done so far (27 SPACE and TIME missing). In doing so (as anticipated), a number of issues arise. Given the propensity of the 99 (plus some 'non-printing character' versions) to diverge from a standard template, can I suggest that we use the worksheets (as CSVs)? Currently there are 7, but a) we aren't done yet and b) there may be a way of combining some of the test datasets.

If we combine tests with the same template into a single worksheet, it is simple to edit. I have organized the data so that it can be sorted easily. The single worksheet makes it easier for me to understand the test data and the same will be true of all those who will be using them.

Can we combine 'Response.comment' and 'Explanation'? There are not many 'Explanations'. We could use a delimiter after the response "|"
Some Response.comments seem out of context, e.g., for COUNTRYCODE_EMPTY, the Response.comment is "[any human readable explanation, e.g. dwc:taxonRANK is EMPTY]". Use of "dwc:taxonRank" has also universally been applied across the SPACE tests that Arthur has added. Related: Do we need "[any human readable explanation, e.g. dwc:taxonRank is not EMPTY]" to be the Response.comment against many value entries? This is a good example of where it would be very easy to edit all those responses in one place.
Do we use "[non-printing characters]" or the characters themselves as Paul has done in two separate test datasets or entries such as "â€¦"? For 'non-printing characters' can we use ISO-8859-1 codes in some form like ISO8859:20 or similar? Having separate non-printing character test sets seems to head away from what I am proposing. We need a standard strategy.
Do we detail all valid values where that is a small set as in DAY_NOTSTANDARD and MONTH_NOTSTANDARD as Paul has done? If so, then do we also do that for Darwin Core vocabs such as BASISOFRECORD_NOTSTANDARD?
Many datasets requiring "bdq:sourceAuthority" were missing this column. I have added these into the original files (and composite worksheets).
Some "bdq:sourceAuthority" entries were missing so I have added them into the original files.
What standard do we use for values of "bdq:sourceAuthority"? For example, should the default reference be added to all test data lines? Currently it is not. I have, for the moment, added a single reference to the first line of each relevant test dataset. There are also several entries "Parameterized Source Authority" that don't seem explicit enough to me.
Do we need an entry for each test data line for "bdq:sourceAuthority.response"? Currently more than 50% are missing any response.

@Tasilee

…o convert the rows in @Tasilee's data sheet in the spreadsheet of tests into a csv file suitable for input into a test harness. Supporting tdwg/bdq#189 used to generate https://github.com/tdwg/bdq/blob/master/tg2/core/TG2_test_validation_data.csv

chicoreus · 2024-08-26T17:38:04Z

For the validation data, see:

https://github.com/tdwg/bdq/blob/master/tg2/core/TG2_test_validation_data.csv
and
https://github.com/tdwg/bdq/blob/master/tg2/core/TG2_test_validation_data_nonprintingchars.csv

These csv files are generated from @Tasilee 's spreadsheet into a form that is more readily consumed by a test validation framework by code in https://github.com/FilteredPush/bdqtestrunner

These csv files and guidance for their use is being assembled for TDWG standards track submission in:
https://github.com/tdwg/bdq/tree/master/tg2/_review/docs/implementers

Tasilee added help wanted TG2 labels Sep 30, 2020

chicoreus added a commit that referenced this issue Oct 1, 2020

Adding example test file to provide an example for #189 with test dat…

8eab115

…a for #187. Filename suggests pattern, testdata_{humanreadablenameoftest}.csv for such test data sets

chicoreus added a commit that referenced this issue Oct 1, 2020

Moving example test data file to a testdata directory to hold other t…

cb9aa5e

…est data files for issue #189.

chicoreus added a commit that referenced this issue Oct 1, 2020

Per issue #189 adding test data for #126 and #147.

b9c3bb6

chicoreus added a commit that referenced this issue Oct 1, 2020

Updating test data and text comments for test data for example test d…

ce93b78

…ata (#189) for test #187.

chicoreus added a commit that referenced this issue Oct 1, 2020

Renaming file for example test data (#189) for test #187 to make file…

7c49bc0

…name consistent with case of test label.

chicoreus added a commit that referenced this issue Oct 1, 2020

Fixing issues noted by @ArthurChapman in comments on issue #189. Addi…

ef70066

…ng leading zeros to tests. Fixing NOT_COMPLIANT out of range month 13.

chicoreus added a commit that referenced this issue Oct 2, 2020

Per issue #189 adding test data for #49. This needs discussion for de…

3a6a69b

…finition of EMPTY() in #111 and #152.

chicoreus added a commit that referenced this issue Oct 2, 2020

Per discussion in #189 regarding best practice use of depth and eleva…

c9cddb9

…tion, updating status and comment for negative values in test data for #187.

ArthurChapman added a commit that referenced this issue Oct 6, 2020

Add files via upload

2e86365

In accord with #189 added test data file for TAXONRANK_EMPTY #161

ArthurChapman added a commit that referenced this issue Oct 6, 2020

Add files via upload

d1f0b77

In accord with #189 added test data file for TAXONRANK_NOTSTANDARD #162

Tasilee added a commit that referenced this issue Oct 6, 2020

Add files via upload

0ed10b1

In accordance with #189, added file testdata_NOTIFICATION_ANNOTATION_NOTEMPTY_#29.csv

Tasilee added a commit that referenced this issue Oct 6, 2020

Added test data for #72

a1d19ea

In accordance with #189, added file testdata_NOTIFICATION_DATAGENERALIZATIONS_NOTEMPTY_#72 for #72

Tasilee added a commit that referenced this issue Oct 7, 2020

Added test data for #94

f62a7c7

In accordance with #189, added testdata_NOTIFICATION_ESTABLISHMENTMEANS_NOTEMPTY_#94.csv for #94

Tasilee added a commit that referenced this issue Oct 7, 2020

Added test data for #58

589dffe

In accordance with #189, added test data testdata_VALIDATION_BASISOFRECORD_EMPTY_#58.csv for #58

Tasilee added a commit that referenced this issue Oct 7, 2020

Added test data for #104

1c35652

In accordance with #189, added testdata_VALIDATION_BASISOFRECORD_NOTSTANDARD_#104.csv for #104

Tasilee added a commit that referenced this issue Oct 7, 2020

Added test data for #103

d46a965

In accordance with #189, added testdata_VALIDATION_DCTYPE_EMPTY_#103.csv for #103

Tasilee added a commit that referenced this issue Oct 7, 2020

Added test data for #91

2d65c94

In accordance with #189, added testdata_VALIDATION_DCTYPE_NOTSTANDARD_#91.csv for #91

Tasilee added a commit that referenced this issue Oct 7, 2020

Added test data for #99

0d6f41a

In accordance with #189, added testdata_VALIDATION_LICENCE_EMPTY_#99.csv for #99

Tasilee added a commit that referenced this issue Oct 7, 2020

Added test data for #38

696d6a9

In accordance with #189, added testdata_VALIDATION_LICENSE_NOTSTANDARD_#38.csv for #38

Tasilee added a commit that referenced this issue Oct 7, 2020

Added test data for #47

57b1908

In accordance with #189, added testdata_VALIDATION_OCCURRENCEID_EMPTY_#47.csv for #47

Tasilee added a commit that referenced this issue Oct 7, 2020

Added test data for #23

c47bb16

In accordance with #189, added testdata_VALIDATION_OCCURRENCEID_NOTSTANDARD_#23.csv for #23

Tasilee added a commit that referenced this issue Oct 7, 2020

Added test data for #117

348f80d

In accordance with #189, added testdata_VALIDATION_OCCURRENCESTATUS_EMPTY_#117.csv for #117

Tasilee added a commit that referenced this issue Oct 7, 2020

Added test data for #116

47d7209

IN accordance with #189, added testdata_VALIDATION_OCCURRENCESTATUS_NOTSTANDARD_#116.csv for #116

ArthurChapman added a commit that referenced this issue Oct 8, 2020

Add files via upload

82aa036

In accord with #189 added test data file for #42

ArthurChapman added a commit that referenced this issue Oct 8, 2020

Add files via upload

47a49a0

In accord with #189 added test data file for #98

ArthurChapman added a commit that referenced this issue Oct 8, 2020

Add files via upload

f6546d1

In accord with #189 added test data file for #119

ArthurChapman added a commit that referenced this issue Oct 8, 2020

Add files via upload

8f0380f

In accord with #189 added test data file for test #119

ArthurChapman added a commit that referenced this issue Oct 9, 2020

Add files via upload

c5b16b8

In accord with #189 added tests data file for #78

ArthurChapman added a commit that referenced this issue Oct 9, 2020

Add files via upload

5cca12a

In accord with #189 added test data file for #40

ArthurChapman added a commit that referenced this issue Oct 9, 2020

Create testdata_VALIDATION_MAXDEPTH_OUTOFRANGE_#107.csv

cea03dd

In accord with #189 added test data file for #107

chicoreus added a commit that referenced this issue Oct 12, 2020

Splitting out test of non-printing characters for #104 from rest of t…

5139e51

…ests for #104, fixing u0000 value in non-printing characters test for #49, all as per #189.

ArthurChapman closed this as completed Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TG2 - Test Data Framework #189

TG2 - Test Data Framework #189

Tasilee commented Sep 30, 2020 •

edited

Loading

Tasilee commented Sep 30, 2020 •

edited

Loading

ArthurChapman commented Sep 30, 2020

Tasilee commented Sep 30, 2020 •

edited

Loading

chicoreus commented Sep 30, 2020

chicoreus commented Sep 30, 2020

ArthurChapman commented Oct 1, 2020

ArthurChapman commented Oct 1, 2020

tucotuco commented Oct 1, 2020 via email

chicoreus commented Oct 1, 2020

chicoreus commented Oct 1, 2020

ArthurChapman commented Oct 1, 2020 •

edited

Loading

ArthurChapman commented Oct 1, 2020

ArthurChapman commented Oct 1, 2020

chicoreus commented Oct 1, 2020

ArthurChapman commented Oct 1, 2020

chicoreus commented Oct 2, 2020

ArthurChapman commented Oct 2, 2020

tucotuco commented Oct 2, 2020 via email

chicoreus commented Oct 2, 2020

Tasilee commented Oct 8, 2020 •

edited

Loading

ArthurChapman commented Oct 8, 2020

Tasilee commented Jan 5, 2021

chicoreus commented Aug 26, 2024

TG2 - Test Data Framework #189

TG2 - Test Data Framework #189

Comments

Tasilee commented Sep 30, 2020 • edited Loading

Tasilee commented Sep 30, 2020 • edited Loading

ArthurChapman commented Sep 30, 2020

Tasilee commented Sep 30, 2020 • edited Loading

chicoreus commented Sep 30, 2020

chicoreus commented Sep 30, 2020

ArthurChapman commented Oct 1, 2020

ArthurChapman commented Oct 1, 2020

tucotuco commented Oct 1, 2020 via email

chicoreus commented Oct 1, 2020

chicoreus commented Oct 1, 2020

ArthurChapman commented Oct 1, 2020 • edited Loading

ArthurChapman commented Oct 1, 2020

ArthurChapman commented Oct 1, 2020

chicoreus commented Oct 1, 2020

ArthurChapman commented Oct 1, 2020

chicoreus commented Oct 2, 2020

ArthurChapman commented Oct 2, 2020

tucotuco commented Oct 2, 2020 via email

chicoreus commented Oct 2, 2020

Tasilee commented Oct 8, 2020 • edited Loading

ArthurChapman commented Oct 8, 2020

Tasilee commented Jan 5, 2021

chicoreus commented Aug 26, 2024

Tasilee commented Sep 30, 2020 •

edited

Loading

Tasilee commented Sep 30, 2020 •

edited

Loading

Tasilee commented Sep 30, 2020 •

edited

Loading

ArthurChapman commented Oct 1, 2020 •

edited

Loading

Tasilee commented Oct 8, 2020 •

edited

Loading