This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is the source code repo for project FAIR
pip install nltk numpy scikit-learn scikit-image matplotlib torchtext
# requirements from pytorch-transformers/wiki
pip install transformers pymediawiki
-
Get pre-defined wikipedia categories (we call it candidate categories/candidate list). These categories are the ones we want to use to summarize/label a given abstract/paper (We also mannually reviewed the list and removed categories that are not relavent).
-
For finding similar and related topics:
- get a ClinicalBERT embeddings for each categories (in the candidate categories)
/sources/Obtain_and_save_embeddingspre_for_predefined_categories.ipynb
- given a category, retrievel the most similar categories via calculating the cosine similarity between each categories
similarity_given_anytopic.ipynb
- get a ClinicalBERT embeddings for each categories (in the candidate categories)
-
For labelling a paper:
-
- get unigram, bigram and trigram in the abstract (step 2).
-
- save ngrams that also show up in the candidate list (step 2).
-
- get all nouns in the abstract (step 3).
-
- retrieve the related categories of nouns, and save the related categories that also show up in the candidate list (step 3).
-
- combine lists from step b and c (step 4).
Label_arbitrary_paper.ipynb
-
-
PPlus_classifier contains two models for PROGRESS-Plus classifiers.