This research is funded under the currently tabled Gene Wiki Project.
Andra Waagmeester (@andrawaag), Andrew I Su (@andrewsu), Carolina Gonzalez-Cavazos (@Carolina1396), Jose Emilio Labra Gayo (@labra), Kat Thornton (@emulatingkat), Lynn Schriml (@lschriml), Michael D Mayers (@mmayers), Sabah Ul-Hasan (@sabahzero), Sai Siddhartha (@saisiddu), Seyed Amir Hosseini Beghaeiraveri (@seyedahbr), Tyler Bettilyon (@tebba-von-mathenstein)
This code acts as the current approach to access and usage of the Wikidata biomedical subgraph for downstream analyses, such as identification of repurposable drug candidates. Older versions of this pipeline commit history can be found here as WRP, note ‘Issues’ section of repository for potentially relevant task items.
Examples of previous applications:
- Waagmeester et al 2020 eLife article, and associated Github repository WD-rephetio-analysis
- Mayers et al 2022 Bioinformatics article, and associated Github repository MechRepoNet
This subgraph is retrieved from the Wikidata January 3rd 2022 archive utilizing the Wikibase Dump Filter (WDF) json dump tool. Parallel efforts that include RDF dump approaches can be found here from Biohackathon 2021 and Biohackaton 2022.
Raw files from Jan 3rd 2022 .json dump through .csv output can be found within the avalanche HPC folder: sulhasan/Wikidata_Biomedical-Subgraph. This folder neighbors code forked from the WD-rephetio-anaylysis Github repository.
There are 18 node types and 41 edge types in this subgraph. Categories are up for discussion as to whether or not they have retained relevancy for when the subgraph is next utilized.
Relevant code is available here. All other code available acts as a point of reference that may be applicable downstream or as a means of yielding more efficient output.
License CC0