Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add biomedical entity normalization #3180

Closed
wants to merge 30 commits into from

Conversation

mariosaenger
Copy link
Collaborator

This PR implements a named entity recognition model focussing on the biomedical domain.

The main contribution is a entity linking model which uses dense (transformer-based) embeddings and (optionally) sparse character-based representations, for normalizing an entity mention to specific identifiers in a knowledge base / dictionary. To this end, the model embeds the entity mention text and all concept names from the knowledge base and outputs the k best-matching concepts based on embedding similarity.

Mario Sänger and others added 11 commits March 14, 2023 11:04
…t basic text and Ab3P pre-processing to the new structure; fix bug in Ab3P abbreviation detection
…m text and (2) entity / concept names from an knowledge base or ontology
- improve name consistency

- make code more pythonic

- dictionaries always do lazy loading

- consistency in dictionary parsing: always yield (cui,name)

- clean up loading w/ CONSTANTS (easily swap models)

- allow access to sparse and dense search
@sg-wbi sg-wbi changed the title Add biomedical entity normalization [WIP]: Add biomedical entity normalization Apr 26, 2023
Samule Garda added 2 commits April 27, 2023 18:28
- yet better naming

- add batched search

- fix dicionary loading
- predict only on mentions of give entity type
@sg-wbi sg-wbi changed the title [WIP]: Add biomedical entity normalization Add biomedical entity normalization May 2, 2023
flair/models/biomedical_entity_linking.py Show resolved Hide resolved
flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved
flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved
flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved
flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved
flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved
- fix mypy typing

- fix typos

- update docstrings

- rm faiss from requirements

- better naming

- allow user to specify annotation layer in predict

- allow no mentions
flair/data.py Outdated Show resolved Hide resolved
import faiss
except ImportError as error:
raise ImportError(
f"You need to install to run the biomedical entity linking: `pip faiss faiss-cpu=={FAISS_VERSION}`"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Install command should be "pip install faiss-cpu.." (instead of "pip faiss faiss-cpu").

Moreover, I would recommend to adjust the warning and refer to the GPU version of faiss too, i.e. add "pip install faiss-gpu..."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the option to place the index on the GPU.
Large dictionaries require a lot of GPU RAM and unless we offer some compression it does not make too much sense.
We can leave it to as a next feature in a separate PR.

flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved
flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved
flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants