Skip to content

casszhao/FAIR

Repository files navigation

Shield: CC BY-NC-SA 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0

FAIR:Finding Accessible Inequalities Research in Public Health (the FAIR Database)

This is the source code repo for project FAIR

System Overview System overview

Requirements

pip install nltk numpy scikit-learn scikit-image matplotlib torchtext
# requirements from pytorch-transformers/wiki
pip install transformers pymediawiki

Workflow

  1. Get pre-defined wikipedia categories (we call it candidate categories/candidate list). These categories are the ones we want to use to summarize/label a given abstract/paper (We also mannually reviewed the list and removed categories that are not relavent).

  2. For finding similar and related topics:

    • get a ClinicalBERT embeddings for each categories (in the candidate categories)
      /sources/Obtain_and_save_embeddingspre_for_predefined_categories.ipynb
      
    • given a category, retrievel the most similar categories via calculating the cosine similarity between each categories
      similarity_given_anytopic.ipynb
      
  3. For labelling a paper:

      1. get unigram, bigram and trigram in the abstract (step 2).
      1. save ngrams that also show up in the candidate list (step 2).
      1. get all nouns in the abstract (step 3).
      1. retrieve the related categories of nouns, and save the related categories that also show up in the candidate list (step 3).
      1. combine lists from step b and c (step 4).
      Label_arbitrary_paper.ipynb
      
      Workflow of labelling a given abstract
  4. PPlus_classifier contains two models for PROGRESS-Plus classifiers.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published