Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user I need to enhance scientific name data #5

Closed
dimus opened this issue Feb 13, 2024 · 0 comments
Closed

As a user I need to enhance scientific name data #5

dimus opened this issue Feb 13, 2024 · 0 comments

Comments

@dimus
Copy link
Member

dimus commented Feb 13, 2024

scientificName field might contain a full name or a canonical form, while authorship is in a separate column. It is useful to have additional fields, here are some ideas:

scientificNameString - usually with authorship, however, if authorship cannot be found, no authorship. So the idea is -- this field contains the most 'full' name we can come up with.

canonicalForm - scientific name with no authorship, no ranks, no hybrid signs (unless it is a hybrid formula).

canonicalFormFull - canonical form with ranks and hybrid signs

canonicalFormStem - canonical form stemmed. No ranks, no hybrid signs (unless it is a hybrid formula).

We take a significant subset of rows and check if scientificName has authorship. If yes, we consider scientificName equal to scientificNameString if not, we generate scientificNameString by combining scientificName and scientificNameAuthorship fields.

We generate canonical forms by parsing scientificName or scientiricNameString

dimus added a commit that referenced this issue Feb 14, 2024
@dimus dimus closed this as completed in 9cf3a06 Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant