New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add biomedical entity normalization #3180

Closed

mariosaenger wants to merge 30 commits into master from bio-entity-normalization

Collaborator

mariosaenger commented Apr 3, 2023

This PR implements a named entity recognition model focussing on the biomedical domain.

The main contribution is a entity linking model which uses dense (transformer-based) embeddings and (optionally) sparse character-based representations, for normalizing an entity mention to specific identifiers in a knowledge base / dictionary. To this end, the model embeds the entity mention text and all concept names from the knowledge base and outputs the k best-matching concepts based on embedding similarity.

Mario Sänger and others added 11 commits

March 14, 2023 11:04


          Initial version (already adapted to recent Flair API changes)

641a3c0


          Revise mention text pre-processing: define general interface and adap…

9779abf

…t basic text and Ab3P pre-processing to the new structure; fix bug in Ab3P abbreviation detection


          Refactor entity linking model structure

8da7d75


          Update documentation

e34c831


          Introduce separate methods for pre-processing (1) entity mentions fro…

f54925c

…m text and (2) entity / concept names from an knowledge base or ontology


          Merge branch 'master' into bio-entity-normalization

90a0acb


          Fix formatting

f1f51fd


          feat(test): biomedical entity linking

f2f21d3


          fix(requirements): add faiss

82c1b8b


          fix(test): hold on w/ automatic tests for now

2e3cda3


          fix(bionel): start major refactoring

adb231e

- improve name consistency

- make code more pythonic

- dictionaries always do lazy loading

- consistency in dictionary parsing: always yield (cui,name)

- clean up loading w/ CONSTANTS (easily swap models)

- allow access to sparse and dense search

sg-wbi changed the title ~~Add biomedical entity normalization~~ [WIP]: Add biomedical entity normalization

Samule Garda added 2 commits

April 27, 2023 18:28


          fix(bionel): major refactor

c80f1be

- yet better naming

- add batched search

- fix dicionary loading


          fix(bionel): assign entity type

d10d297

- predict only on mentions of give entity type

sg-wbi changed the title ~~[WIP]: Add biomedical entity normalization~~ Add biomedical entity normalization


          fix(biencoder): set sparse encoder and weight

25ba2dd

mariosaenger commented

View reviewed changes

flair/models/biomedical_entity_linking.py Show resolved Hide resolved

flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved

flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved

requirements.txt Outdated Show resolved Hide resolved

flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved

flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved

flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved

flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved


          fix(bionel): address comments

4525d3b

- fix mypy typing

- fix typos

- update docstrings

- rm faiss from requirements

- better naming

- allow user to specify annotation layer in predict

- allow no mentions

sg-wbi reviewed

View reviewed changes

flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved


          fix(candidate_generator): container for search result

3a5913d

mariosaenger commented

View reviewed changes

flair/data.py Outdated Show resolved Hide resolved

flair/models/biomedical_entity_linking.py Outdated

+                  import faiss
+              except ImportError as error:
+                  raise ImportError(
+                      f"You need to install to run the biomedical entity linking: `pip faiss faiss-cpu=={FAISS_VERSION}`"

Collaborator Author

mariosaenger May 15, 2023

Typo: Install command should be "pip install faiss-cpu.." (instead of "pip faiss faiss-cpu").

Moreover, I would recommend to adjust the warning and refer to the GPU version of faiss too, i.e. add "pip install faiss-gpu..."

Collaborator

sg-wbi May 19, 2023

I have removed the option to place the index on the GPU.
Large dictionaries require a lot of GPU RAM and unless we offer some compression it does not make too much sense.
We can leave it to as a next feature in a separate PR.

flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved

flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved

flair/models/biomedical_entity_linking.py Outdated Show resolved Hide resolved

Samule Garda and others added 9 commits

May 19, 2023 13:20


          fix(predict): default annotation layer iff not provided by use

734d895

- fix typo


          fix(label): scores can be >= or <=

d79f871


          fix(candidate): parametrize database name

118fb95


          feat(candidate_generator): cache sparse encoder

1fcfddf

- better naming

- unique cache name


          fix(candidate_generator): minor improvements

9322c1b

- add option to time search

- change error to warning if pre-trained model is not hybrid

- check if there are mentions to predict


          feat(linking_candidate): pretty print

071f51e


          fix(candidate_generator): check sparse encoder for sparse search

a23f360


          chore: crystal clear dictionary name

ce29290


          feat(candidate_generator): add sparse index

0d65336

Samule Garda and others added 5 commits

June 2, 2023 16:16


          fix(candidate_generator): KISS: sparse search w/ scipy sparse matrices

02812f0


          Minor update to comments and documentation

ca6eee8


          Fix tests and type annotations

6c8f219


          Merge branch 'master' into bio-entity-normalization

2fa43cc


          Merge

d90d92d

helpmefindaname mentioned this pull request

Entity Mention Linker #3388

Merged

mariosaenger closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet