Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment of new biological taxon standard names (Section 6.1.2) with the biological data standards community #309

Closed
albenson-usgs opened this issue Nov 19, 2020 · 25 comments
Assignees
Labels
agreement not to change Issue closed with agreement not to make a change to the conventions enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@albenson-usgs
Copy link

albenson-usgs commented Nov 19, 2020

Title

Alignment of new biological taxon standard names (Section 6.1.2) with the biological data standards community

Moderator

@davidhassell

Moderator Status Review [last updated: YYYY-MM-DD]

Brief comment on current status, update periodically

Technical Proposal Summary

Opportunity to bring CF community and the Biological Data Standards Community (TDWG, https://www.tdwg.org/) into alignment

Benefits

Clarity for data users and data managers that concepts in the CF community are the same as those in the biodiversity information standards community.

Detailed Proposal

Complete proposal

The newly added CF standard names biological_taxon_name and biological_taxon_identifier represent the same information that is currently identified in the Darwin Core biological data standard scientificName and scientificNameID. Since the concepts are the same and these standard names have been in use in the biological data community since 2012, I would like to see an enhancement that CF adopt these existing standard names (changing to follow CF conventions to be scientific_name and scientific_name_id) and link to the Darwin Core standard to promote interoperability between these two communities. Darwin Core is used by the Ocean Biodiversity Information System and the Global Biodiversity Information Facility, among others, and aligning the current CF standard names to existing biological data standards represents an opportunity for these two communities to work together and reduce redundancy throughout biological data systems. Moreover, implementing this enhancement will improve understanding for downstream users or managers of data to be more certain that these concepts are in fact representing the same information.

@albenson-usgs albenson-usgs added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Nov 19, 2020
@dblodgett-usgs
Copy link
Contributor

Hey @albenson-usgs - It's not really clear to me what the specific change to the specification would be. Can you boil your detailed proposal down to a problem / solution?

@albenson-usgs
Copy link
Author

albenson-usgs commented Nov 19, 2020

Instead of biological_taxon_name -> scientific_name (and link to Darwin Core in the documentation)
Instead of biological_taxon_identifier -> scientific_name_id (and link to Darwin Core in the documentation)

@dblodgett-usgs
Copy link
Contributor

@albenson-usgs
Copy link
Author

@davidhassell would you mind being moderator for this?

@davidhassell
Copy link
Contributor

@albenson-usgs - I'm happy to moderate.

@davidhassell
Copy link
Contributor

Is it right that this proposal is essentially for changing the standard names biological_taxon_name (or biological_taxon_lsid - see cf-convention/vocabularies#46) and biological_taxon_identifier to scientific_name and scientific_name_id respectively?

As CF is applicable to many areas of geoscience, standard names are more self-explanatory than would suffice for any one area because they answer the question, “What does this mean?”, rather than the question, “What do we call this?”. It seems that the use "scientific" in the proposed names is not very informative in this context, as it doesn't really tell a third party anything about the data.

The existing names ("biological") seem very understandable from a lay perspective (i.e. mine!), and you say that they are not wrong, so I wonder if this change is required?

Perhaps a connection to Darwin Core couldl be made in the standard name descriptions (which currently mentions WoRMS and ITIS) - would that be appropriate?

It would be very useful to hear from others with expertise in the use of this sort of data.

@roy-lowry
Copy link

First thing to say is that this ticket is intimately linked to cf-convention/vocabularies#46 which exposed that biological_taxon_identifer was set up in error - it should have been biological_taxon_lsid. The suggested fix is that biological_taxon_identifier be deprecated and aliased to biological_taxon_lsid.

This request resurrects a discussion on Trac that ran for considerable time and takes the position back at the beginning of that discussion. In a nutshell the initial name proposals were criticised as being too parochial for a multidisciplinary standard like CF. Whilst scientificName works in Darwin Core - a biological standard - there's nothing to tell a non-biologist that scientific_name relates to biology.

My suggestion as a compromise would be to include specific references to the Darwin Core labels in the description. For biological_taxon_name I would suggest:

"Biological taxon" is a name or other label identifying an organism or a group of organisms as belonging to a unit of classification in a hierarchical taxonomy. The quantity with standard name biological_taxon_name is the human-readable label for the taxon such as Calanus finmarchicus. The label should be registered in either WoRMS (http://www.marinespecies.org) or ITIS (https://www.itis.gov/) and spelled exactly as registered. See Section 6.1.2 of the CF convention (version 1.8 or later) for information about biological taxon auxiliary coordinate variables. This is equivalent scientificName in the Darwin Core standard.

I'll do something similar for biological_taxon_lsid when I set that up.

@davidhassell You can compose messages quicker than me but we're saying the same thing!!!

@roy-lowry
Copy link

@albenson-usgs A question that's more related to other tickets, but I'll ask it here. How would Darwin Core deal with datasets that include a mixture of true taxa (e.g. Calanus finmarchicus) with functional/morphological groups such as 'prokaryotes', 'nanophytoplankton'? This relates to some concerns I'm having with some other Standard Name tickets (86 and 87). CF is heading down a road of using one convention for taxa (a multi-dimensional storage array with taxon identifiers as a co-ordinate under a single Standard Name) and another for functional/morphological groups (a separate array for each group each with its own Standard Name).

Ticket 86 proposes a further tranche of functional group Standard Names but it includes 'dinoflagellates' (a common name for a Class) and Prochlorococcus (a scientific name for a Genus) mixed in with things like 'cryptophytes' and 'haptophytes'. Would Darwin Core allow entries like 'cryptophytes' under scientificName? If not, how would they be stored under Darwin Core.

@roy-lowry
Copy link

@davidhassell How do I link this ticket to Standard Names 86 and 87?

@davidhassell
Copy link
Contributor

Hi Roy - there may be other ways, but it does work if you just put the full URL of the issues in the other repo: cf-convention/vocabularies#29 and cf-convention/vocabularies#105 You can even leave out the "https://github.com/" bit - it will render and link the same.

@roy-lowry
Copy link

@davidhassell Many thanks.

@albenson-usgs
Copy link
Author

albenson-usgs commented Nov 20, 2020

@roy-lowry Yes you can store any scientific name within the taxonomic hierarchy in Darwin Core scientificName and then there are more terms to specify the taxonomy as well as the rank. So for instance you could have:
scientificName = "Calanus finmarchicusand then have:scientificNameID= "urn:lsid:marinespecies.org:taxname:104464" taxonRank= "species" kingdom= "Animalia" phylum= "Arthropoda" class= "Crustacea" (and on down to...)specificEpithet` = "finmarchicus"

For something like dinoflagellates it would be:
scientificName = "Dinoflagellata"
[vernacularName](https://dwc.tdwg.org/terms/#dwc:vernacularName) = "dinoflagellates"
scientificNameID = "urn:lsid:marinespecies.org:taxname:146203"
taxonRank = "Infraphylum"
kingdom = "Chromista" etc

But really you have all you need with just scientificName and scientificNameID since all the rest can be extracted from WoRMS using the LSID. Just showing that you can include a scientific name at any level of the taxonomic hierarchy in Darwin Core and the associate terms. Also GBIF is accepting BOLD and UNITE stable Operational Taxonomic Units (OTUs, eDNA) in scientificName as well (more info on that here). Realize that's more than what you're asking about but just in case it's relevant.

@MathewBiddle
Copy link

MathewBiddle commented Nov 20, 2020

I think this example DarwinCore Occurrence data file might help the discussion.

@albenson-usgs
Copy link
Author

I totally understand what @roy-lowry and @davidhassell are suggesting about standard names being self-explanatory. What I'm ultimately trying to avoid is that people have to put the same information twice to make it clear that they are following both standards. Let's say we have a data manager that has plankton data and they want to make sure their data are CF compliant but that they can also be ingested by global biological data aggregators. They might feel they would need to implement the CF standard name biological_taxon_name = "Calanus finmarchicus" and then also include the Darwin Core scientificName = "Calanus finmarchicus". Also by adopting the Darwin Core term I'm hoping it would promote synergy and collaboration between the two communities.

@roy-lowry
Copy link

@albenson-usgs I TOTALLY understand that a scientificName can be of any taxonomic rank. My question that hasn't been answered is what if the dataset includes things that are morphological groups like 'microphytoplankton'. In my experience it is quite common to have these things mixed together with taxa in observational datasets.

@albenson-usgs
Copy link
Author

Fair enough. Apologies I misunderstood your question. If there is no logical taxonomic classification for the grouping then that's definitely harder. There was recently discussion about morphospecies in the TDWG Github. I'm not sure that it's exactly the same but maybe it's analogous enough that it begins to address this? tdwg/dwc-qa#162

@roy-lowry
Copy link

@albenson-usgs Thanks that thread reinforces my perception that 'taxon' comes with a purity and that mixing morphological terms into 'biological_taxon_name' will cause a semantic divergence between CF and Darwin Core that is far more significant than a label like a Standard Name. Conversely, the approach I've taken with other communities like SeaDataNet is match the standard to the data, allowing mixtures of taxa and groups under the umbrella concept of 'biological entity'. See http://vocab.nerc.ac.uk/collection/S25/current/accepted/ for a listing of what I mean. This suited the requirements of the community we were serving, which was to provide a semantic framework that would cope with any biological or biogeochemical dataset that they threw at it. Darwin Core interoperability wasn't top of the agenda and semantic crosswalks between what we've put together for SeaDataNet and Darwin Core would require work (e.g. building mappings).

CF is at the stage where I thing that interoperability with both SeaDataNet and Darwin Core could be made much easier by making the correct decisions at this stage. I'll think on this further and maybe have some off-line discussions before responding again to cf-convention/vocabularies#29 next week.

@roy-lowry
Copy link

@albenson-usgs This is a separate answer to your response to @davidhassell and myself. Vocabulary search engines in the past decade have been designed to search by default both labels and descriptions. Therefore our suggestion to extend the Standard Name descriptions should draw the attention of your hypothetical data manager to the fact that biological_taxon_name and scientificName are the same thing.

I'm not sure who is watching this thread, but having been through the Trac 99 debate I know there are people who would voice strong opinions against your suggested Standard Name label change. CF is a standard based on physics that requires other domains to explain themselves with great clarity. As a specialist in biogeochemical semantics this is something I've learned over the years!

@timvdstap
Copy link

Interesting discussion. It would of course be great if CF and DwC can synergize where possible, allowing a data provider to be both compliant with CF and DwC when providing a single term/column (instead of having to duplicate information). Following this thread to see what comes out of it! 👍

@JonathanGregory
Copy link
Contributor

Dear all
I appreciate this thoughtful discussion and I agree with @roy-lowry that it's important to make the right choices carefully. I am one of those who would object to this proposed standard name change on the grounds already stated! While I'm generally against redundancy, I think the bad sort of redundancy occurs when things are said in two rather different ways that can be inconsistent without its being obvious. It is a less dangerous sort of redundancy when a given piece of information is repeated exactly the same. Therefore I think it wouldn't be bad if the CF biological_taxon_name and Darwin Core scientificName were both attached to a data variable with exactly the same value. This is easy to check automatically for consistency, by eye or by machine.
Jonathan

@roy-lowry
Copy link

From discussions at 2021 CF and a subsequent Zoom meeting it emerged that biological_taxon_name and Darwin Core scientificName are not exact synonyms (the latter is broader because it doesn't require association with an identifier). Likewise biological_taxon_lsid is broader than scientific_name_id because there are other identification schemes. This relationship has been documented in the Standard Name description.

It also became clear that CF - a standard developed for global climate model data and based on text-unfriendly NetCDF - might not be the most appropriate standard for low-volume biological datasets that are usually handled in spreadsheets. Should the use case for going into CF be strong enough then they can be accommodated, but not as easily as encoding into Darwin Core. The sorts of biological dataset well-matched to CF are high volume data like model output, satellite images and data syntheses.

Consequently, it is proposed that no further action be taken on this ticket.

@JonathanGregory JonathanGregory added the new contributor This issue was worked on by new contributors to the CF conventions label Jan 1, 2024
@JonathanGregory
Copy link
Contributor

Thank you, @roy-lowry for this useful summary. If no-one disagrees within the next three weeks (before Tuesday 23rd January) this issue will be closed, and labelled agreement not to change.

Thanks to all who contributed to the discussion. In particular, thanks to Abby Benson @albenson-usgs for raising it. Abby will be added to the list of contributors to the conventions.

Happy New Year

@albenson-usgs
Copy link
Author

From my perspective this ticket can be closed.

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Jan 3, 2024

From my perspective this ticket can be closed.

Thanks, Abby @albenson-usgs

@davidhassell
Copy link
Contributor

Thanks for resurrecting this issue - I also agree with the resolution.

@JonathanGregory JonathanGregory removed the new contributor This issue was worked on by new contributors to the CF conventions label Jan 8, 2024
@JonathanGregory JonathanGregory added the agreement not to change Issue closed with agreement not to make a change to the conventions label Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agreement not to change Issue closed with agreement not to make a change to the conventions enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

No branches or pull requests

8 participants