Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idea: summary chart of source ontologies #419

Open
allaway opened this issue Apr 5, 2024 · 2 comments
Open

idea: summary chart of source ontologies #419

allaway opened this issue Apr 5, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request priority: low

Comments

@allaway
Copy link
Contributor

allaway commented Apr 5, 2024

IT would be interesting to auto-generate a plot for the readme that summarizes the source ontologies that we are building on.

@allaway allaway self-assigned this Apr 5, 2024
@allaway
Copy link
Contributor Author

allaway commented Apr 5, 2024

Here's a pretty rough script to start:

library(tidyverse)

linkmls <- glue::glue("{getwd()}/{list.files('./modules/', recursive = T, full.names = T)}")



extract_meanings <- function(dat) {
  dat %>%
    map_depth(3, pluck, c("meaning"), .ragged = T) %>%        # Extract 'meaning' from within each 'permissible_values'
    discard(is.null) %>% 
    unlist()                                  # Flatten the result into a vector
}

extract_sources <- function(dat) {
  dat %>%
    map_depth(3, pluck, c("source"), .ragged = T) %>%        # Extract 'meaning' from within each 'permissible_values'
    discard(is.null) %>% 
    unlist()                                 # Flatten the result into a vector

}

sources <- sapply(linkmls, function(x){
  res1 <- yaml::read_yaml(x) %>% flatten()
  c(extract_meanings(res1), extract_sources(res1)) 
}) %>% unlist 
  
library(dplyr)
library(stringr)

source_analyzed <- tibble(sources = sources) %>%
  mutate(simple_source = 
           case_when(
             # Grouping academic publications under a single label
             str_detect(sources, "pubmed.ncbi.nlm.nih.gov") ~ "Publication",
             str_detect(sources, "doi.org") ~ "Publication",
             str_detect(sources, "ncbi.nlm.nih.gov/pubmed/") ~ "Publication",
             str_detect(sources, "ncbi.nlm.nih.gov/pmc/articles") ~ "Publication",
             str_detect(sources, "ncbi.nlm.nih.gov/books") ~ "Publication",
             str_detect(sources, "journals.plos.org") ~ "Publication",
             str_detect(sources, "nature.com") ~ "Publication",
             # Existing and other cases
             str_detect(sources, "ONCOTREE") ~ "ONCOTREE",
             str_detect(URLdecode(sources), "obolibrary") ~ str_extract(URLdecode(sources), 'obo/[:alpha:]+'),
             str_detect(sources, "www.ncbi.nlm.nih.gov/geo") ~ "GEO",
             str_detect(sources, "github.com/HumanCellAtlas") ~ "Human Cell Atlas",
             str_detect(sources, "Sage Bionetworks") ~ "Sage Bionetworks",
             str_detect(sources, "edamontology.org") ~ "EDAM",
             str_detect(sources, "thermofisher.com") ~ "Thermo Fisher Scientific",
             str_detect(sources, "promega.com") ~ "Promega",
             str_detect(URLdecode(sources), "ebi.ac.uk/efo") ~ "ebi/EFO",
             str_detect(URLdecode(sources), "ebi.ac.uk/cmpo") ~ "ebi/CMPO",
             str_detect(sources, "bioassayontology.org") ~ "BioAssay Ontology",
             str_detect(sources, "cognitiveatlas.org") ~ "Cognitive Atlas",
             str_detect(sources, "bioconductor.org") ~ "Bioconductor",
             str_detect(sources, "gatk.broadinstitute.org") ~ "GATK",
             str_detect(sources, "software.broadinstitute.org/gatk") ~ "GATK",
             str_detect(sources, "en.wikipedia.org") ~ "Wikipedia",
             str_detect(sources, "patents.google.com") ~ "Google Patents",
             str_detect(sources, "docs.gdc.cancer.gov") ~ "GDC",
             str_detect(sources, "creativecommons.org") ~ "Creative Commons",
             str_detect(sources, "cellosaurus") ~ "Cellosaurus",
             str_detect(sources, "mesh") ~ "MeSH",
             str_detect(sources, "www.ncbi.nlm.nih.gov/Taxonomy") ~ "NIH Taxonomy",
             str_detect(sources, "obo\\:NCIT") ~ "obo/NCIT",
             str_detect(sources, "jax.org") ~ "JAX",
             str_detect(sources, "illumina") ~ "Illumina",
             TRUE ~ "Other" 
           ))

ggplot(source_analyzed) +
  geom_bar(aes(x = fct_infreq(simple_source)))  +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 50, hjust = 1)) +
  labs(x = "source", y = "count")

@allaway
Copy link
Contributor Author

allaway commented Apr 5, 2024

image

@anngvu anngvu added the enhancement New feature or request label Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority: low
Projects
Status: No status
Development

No branches or pull requests

2 participants