-
Notifications
You must be signed in to change notification settings - Fork 3
/
index.Rmd
229 lines (137 loc) · 16.2 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
title: "Ecological Trait-data Standard"
author: "Florian D. Schneider, Malte Jochum, Gaëtane LeProvost, Caterina Penone, Andreas Ostrowski, Nadja K. Simons"
date: "v0.10, released: 28 March 2019"
params:
release: FALSE
---
`r if(params$release) '<!--'`
<div class="alert alert-warning">
**You are viewing the development branch of ETS!** Do not refer to these terms and this website when publishing standardised trait data! The release reference is at https://terminologies.gfbio.org/terms/ets/pages/.
</div>
`r if(params$release) '-->'`
# Vocabulary
This defined vocabulary aims at providing all essential terms to describe datasets of functional trait measurements and facts for ecological research.
The list of terms is ordered into a **core section** with essential columns for trait data, **extensions** which are allowing to provide additional layers of information, as well as a vocabulary for **metadata** information of particular importance for trait data. Another section provides defined **terms for trait definitions** to be included in the metadata or published along with the dataset.
We provide four extensions of the vocabulary, that allow for additional information on the trait measurement.
- the `Taxon` extension provides further terms for specifying the taxonomic resolution of the observation.
- the `Occurrence` extension contains information on the level of individual specimens, such as date and location, or method of sampling and preservation, or physiological specifications of the phenotype, such as sex, life stage or age.
- the `MeasurementOrFact` extension contains information at the level of single measurements or reported values, such as the original literature from where the value is cited, the method of measurement or statistical method used for aggregation.
- The `BiodiversityExploratories` extension provides columns for linking trait data from the Biodiversity Exploratories project to the respective plots and regions (www.biodiversity-exploratories.de).
This glossary of terms is available as
- this human-readable reference (html website), including commentaries and further definitions,
- a csv table file (the 'source' file, [ETS.csv](ETS.csv),
- a machine readable .owl ontology file (the release file, [ets.owl](ets.owl), compliant with semantic web standards accessible via an API (produced by and hosted on the GFBio Terminology Server: https://terminologies.gfbio.org/terminology/?ontology=ETS).
The rationale for developing the Ecological Trait-data standard has been cast in a paper that is available as pre-print:
> Schneider, F.D., Jochum, M., Le Provost, G., Ostrowski, A., Penone, C., Fichtmüller, D., Gossner, M.M., Güntsch, A., König-Ries, B., Manning, P. and Simons, N.K. (2018) Towards an Ecological Trait-data Standard, biorxiv.org DOI: [10.1101/328302](https://doi.org/10.1101/328302)
The vocabulary builds on terms of the Darwin Core Standard and its extensions (DWC; these are referenced in field 'Refines'; the full Darwin Core Standard can be found here: http://rs.tdwg.org/dwc/terms/index.htm) as well as on terms of the Dublin Core Metadata Initiative (http://www.dublincore.org/specifications/dublin-core/dcmi-terms/). Note that the documentation page resolves deprecated terms to the term that replaces them in definition, unless they differ in definition. Via API access or in the OWL and CSV source file, the definition is accessible for each deprecated term.
## Table of contents
```{r, results = 'asis', echo = FALSE, warning = FALSE}
library(data.table)
library(knitr)
glossary <- read.csv("ETS.csv", fileEncoding = "UTF-8")
urlroot <- ""
# eliminate all deprecated terms for creation of documentation
# WARNING: if deprecated terms are kept with definition different from replacing terms, they should be listed! requires refinement of this part
glossary <- glossary[-which(glossary$Deprecated),]
columnNames <- glossary$columnName
sections <- glossary$Namespace
sectionNames <- levels(glossary$Namespace)
glossary <- as.data.table(glossary)
# split list of terms into list of tables per namespace
glossary <- setNames(split(glossary[,-c(1), with = FALSE], f = as.integer(glossary$Namespace)), sectionNames)
glossary <- lapply(glossary, function(x) setNames(split(x, f = seq_along(x$columnName)), x$columnName))
namespace <- c("Traitdata", "Metadata", "Traitlist", "Taxon", "MeasurementOrFact", "Occurrence", "BiodiversityExploratories" )
j_verbose <- c("core-traitdata-terms", "metadata-vocabulary", "terms-for-trait-definitions", "extension-taxon", "extension-measurement-or-fact", "extension-occurrence", "extension-biodiversity-exploratories" )
for(j in namespace) {
if(j %in% c("Taxon", "MeasurementOrFact", "Occurrence", "BiodiversityExploratories") ) ext <- "Extension: " else ext <- ""
cat("[**",ext, j, "**](", urlroot,"#", j_verbose[match( j, namespace)]," ) ", sep = "" )
cat("\n")
for(i in columnNames[sections == j]) {
dd <- as.data.frame(t(glossary[[j]][[as.character(i)]][,-1, with=FALSE]))
colnames(dd) <- i
cat("[","`", i, "`", "](", urlroot, "#", tolower(i), ") | ", sep = "")
}
cat("\n")
}
```
---
# Core traitdata terms
For the essential primary data (1. trait value, 2. taxon assignment, 3. trait name), it is recommended to report standardised names and value units following the community consensus in the field.
By linking to (public) ontologies via the field `taxonID`, further taxonomic information can be extracted for analysis. Alternatively, `taxonID` may also link to an accompanying datasheet that contains information on the taxonomic resolution or specification of the observation.
Similarly, linking to published trait definitions in public thesauri or ontologies via the field `traitID` allows an unambiguous interpretation of the trait measurement. If no online ontology is available, an accompanying data table should specify the trait definitions by making use of terms provided in the section 'Traitlist' below.
Additionally, 'verbatim' duplicates of the core terms are provided to capture the original data provider's values and labels.
This ensures compatibility on the provider's side and transparency for data users on the reported measurements and facts, and enables checking for inconsistencies and misspellings in the complete dataset provided by the author. This practice of double recordof author-side and standardised data has successfully established for the TRY database on plant traits, for instance (Kattge et al. 2011. TRY – a global database of plant traits. Global Change Biology, 17, 2905–2935).
```{r, results = 'asis', echo = FALSE}
parseterms <- function(namespace) {
sorting <- c( "Definition", "Comment","valueType", "Identifier", "DateIssued", "FirstIssuedIn", "DateModified", "Refines", "Replaces", "Deprecated", "ReplacedBy" )
for(i in columnNames[sections == namespace]) {
dd <- t(glossary[[namespace]][[as.character(i)]][,-1, with=FALSE])
dd <- dd[match(sorting, rownames(dd)), ]
dd <- as.data.frame(dd)
colnames(dd) <- i
cat("## `", i, "` ", sep = "")
cat("\n")
cat("[go to top](", urlroot, ")", sep = "")
#cat(" | [direct link](", urlroot, "#", tolower(i), ")", sep = "")
print(kable(dd, format = "markdown"))
}
}
parseterms("Traitdata")
```
---
# Metadata vocabulary
To describe trait-datasets unambiguously, we suggest the following terms to be used in the metadata accompanying the publication of trait-data.
Each dataset should have a self-assigned title provided in `datasetName` and a description in `datasetDescription`.
To retain the attribution to the original data contributor use `author`. Data owner `rightsHolder` states the person or organization that owns or manages the rights to the data; `bibliographicCitation` states a bibliographic reference which should be cited when the data is used; `rights` and `license` specify under which terms and conditions the data can be used, re-used and/or published.
Since trait data are of great use for synthesis studies, information about how the data may be distributed, re-used and attributed to are of particular importance for trait datasets. Most researchers encourage re-use of their published datasets while making sure they are sufficiently credited. The use of permissive licenses for traitdata publications, such as Creative Commons Attribution or Creative Commons Zero/Public Domain release, has been established as the gold standard.
When combining datasets from different sources, `datasetID` may keep the reference to an external data sheet specifying the metadata. Alternatively, the terms of this section can be added as columns to the dataset to maintain important information about authorship and terms of use at the measurement level.
```{r, results = 'asis', echo = FALSE}
parseterms("Metadata")
```
---
# Terms for trait definitions
To cope with the variety of data types, trait-data should make use of globally valid and unambiguous trait definitions from public ontologies or thesauri, by providing their URIs in the field `traitID`.
Using this standardized terminology will facilitate merging trait definitions from multiple sources.
If no published trait definition is available that can be referenced, trait-datasets should be accompanied by a dataset-specific glossary of traits that would provide a definition specific to the study context.
To be unambiguous, this list should be defining terms based on other well-defined terms from semantic ontologies, e.g. for units or higher hierarchical terms from trait ontologies (using `broaderTerm`).
Also refer to well-defined terms of other ontologies that describe standard units, morphological body parts, protein characteristics (Protein Ontology) or chemical terms (e.g. the ChEBI, http://www.obofoundry.org/ontology/chebi.html).
The following vocabulary helps to provide trait definitions as a separate data table or within the metadata object along with the publication of the trait dataset.
```{r, results = 'asis', echo = FALSE}
parseterms("Traitlist")
```
---
# Extension: Taxon
This section provides terms to specify taxonomic resolution and hierarchical depth of the observed measurement or fact.
Especially when only providing a 'scientificName' an entry in `kingdom` is recommended to identify homonyms. Additionally, `class`, `order`, `phylum`, `family` and `genus` are variables to filter the dataset by. The entry `taxonRank` should be specified to name the hierarchical level to which the observation applies, i.e. if data were measured on a verified instance of species "Agrotis exclamationis", `taxonRank` would be "species". If the reported fact was generali
If the dataset provides a URI within `taxonID` that maps to a taxonomic online reference or lookup table of an unambiguous and commonly accepted or verified species name, most of these information are redundant. They should be provided nonetheless to enable sorting and analysis by higher hierarchical levels.
```{r, results = 'asis', echo = FALSE}
parseterms("Taxon")
```
---
# Extension: Measurement or Fact
This section provides additional information about a reported measurement or fact and in most cases can easily be included as extra columns to the core dataset.
As a high-level discrimination of the source of the measurement or fact, the Darwin Core Term `basisOfRecord` takes an entry about the type of trait data recorded: Were they taken by own measurement (distinguish "LivingSpecimen", "PreservedSpecimen", "FossilSpecimen") or taken from literature ("literatureData"), from an existing trait database ("traitDatabase"), or is it expert knowledge ("expertKnowledge"). It is highly recommended to provide further detail about the source in the column `basisOfRecordDescription`.
To keep track of potential sources of noise or bias in measured data, the method of measurement (`measurementMethod`), the person conducting the measurement (`measurementDeterminedBy`), and the date at which the measurement was obtained (`measurementDeterminedDate`) are recorded.
Authors would often report aggregate data of repeated or pooled measurements, e.g. by weighing multiple individuals simultaneously and calculating an average. In these cases, recording the number of individuals (`individualCount`) along with a dispersion measure (e.g. variance or standard deviation, `dispersion`) or range of values (e.g. min and max of values observed in the field `measurementValueMin`, `measurementValueMax`) is adviced. The field `statisticalMethod` names the method for data aggregation (e.g. mean or median) as well as the variation or range (e.g. reporting variance or standard deviation).
For data not obtained from own measurement, the field `references` provides a precise reference to the source of data (e.g. a book or existing database) or the authority of expert knowledge.
For literature data, the original source might report trait values on the family or genus level, but the dataset author infers and reports the trait data at species level (e.g. if the entire genus reportedly shares the same trait value). To preserve this information, the column `measurementResolution` should report the taxon rank for which the reported value was originally assessed.
```{r, results = 'asis', echo = FALSE}
parseterms("MeasurementOrFact")
```
# Extension: Occurrence
This category of terms contains further information about the individual specimen or occurrence that has been observed and measured. Especially for analyses of intra-specific trait variation, this composes valuable data. It also helps tracking the methodology and primary source of the data and keep the reference to the actual specimen (e.g. for museum collections or related data analysis).
For both literature and measured data, trait values may be recorded for different sub-categories of individuals of a taxon to capture polymorphisms, for instance differentiated by sex or life stage. The template provides the fields `sex`, `lifeStage`, `age`, and `morphotype` for this distinction.
Sampling may be further specified using a unique identifier for the sampling event (`eventID`) which references to an external dataset. The record of a `samplingProtocol` may capture bias in samling methods. Further procedures and methods of preservation should be reported in `preparations`.
Seasonal variation of traits may be recorded by assigning a date and time of sampling to the occurrence, using the fields `year`, `month` and `day`, depending on resolution. Further field definitions of the Darwin Core Standard can be applied instead, to refer to a geological stratum, for instance.
To capture geographic variation of traits, a set of fields for georeferencing can put the observation into spatial and ecological context (`habitat`, `decimalLongitude`, `decimalLatitude`, `elevation`, `geodeticDatum`, `verbatimLocality`, `country`, `countryCode`). The field `locationID` may be used to reference the occurrence to a dataset-specific or global identifier. This allows the trait data to double as observation data, e.g. for upload to the GBIF database.
For most trait data compiled from literature or expert knowledge, the level of information on an 'occurrence' would not apply, since no specific individual has been observed. In this case, the field 'occurrenceID' should be left blank in the core data. In cases where different aggregate ranges or averages are reported for male and female individuals, the columns sex or developmental stage may be used without the reference to an occurrence ID.
```{r, results = 'asis', echo = FALSE}
parseterms("Occurrence")
```
# Extension: Biodiversity Exploratories
This section records location in the context of the Biodiversity Exploratories project (www.biodiversity-exploratories.de). The field `OriginExploratories` flags trait measurements originating from samples in the project context. `Exploratory` and `ExploType` allow to place the sample within a region or a landscape type (Grassland or Forest). From `ExploratotriesPlotID` a detailled georeference can be inferred. Additional spatial resolution, e.g. on subplots, may be provided in `locationID` of the Occurence extension.
Trait data uploaded to the Biodiversity Exploratories Information System (BExIS) should use the vocabulary in a single-file longtable format (no DwC-Archives supported).
```{r, results = 'asis', echo = FALSE}
parseterms("BiodiversityExploratories")
```