Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schematic RDB: adding new columns to DB tables (example NF GFF) #757

Closed
4 tasks
milen-sage opened this issue May 25, 2022 · 58 comments
Closed
4 tasks

schematic RDB: adding new columns to DB tables (example NF GFF) #757

milen-sage opened this issue May 25, 2022 · 58 comments
Assignees
Labels
major a medium priority item short-term Timeline: 2-4 weeks

Comments

@milen-sage
Copy link
Collaborator

milen-sage commented May 25, 2022

NF GFF DB requires adding a few more fields to their DB results:

  • ResourceId - Resource
  • InvestigatorId - Development
  • Institution - Institution
  • How To Acquire - Resource
    -updated by @mialy-defelice
    This may require update of the data model and regenerating the DB tables; or, just an update of join results (or both).

This should be a good intro issue for working with scheamtic's RDB layer; compare and contrast with the setup of iAtlas.

AC: DB query results and corresponding Synapse tables for the GFF project have been successfully updated with the three fields above.

@allaway
Copy link
Contributor

allaway commented May 26, 2022

Related to nf-osi/nf-research-tools-schema#13

@allaway
Copy link
Contributor

allaway commented May 26, 2022

I've "staged" the data temporarily here: https://www.synapse.org/#!Synapse:syn31002734/tables/ (in the database, howToAcquire will probably be an attribute of the Resource table and not it's own separate table).

We can either ingress the data properly into the database, or if need be and time is running short, can create a MaterializedView that includes a JOIN to this table to make the data available to platform devs.

@allaway
Copy link
Contributor

allaway commented Jun 3, 2022

I'd like to slight modify the AC for this issue @milen-sage / @andrewelamb if that is OK. We have another issue https://sagebionetworks.jira.com/browse/NFINT-664 that we can resolve at the same time. In addition to the tables mentioned in the initial ticket, can we also please add the Funder and Donor tables in this join? These should all be 1-1 relationships so they should not cause any unexpected replication of tools.

Thank you!

@milen-sage
Copy link
Collaborator Author

@andrewelamb what's the state of this issue?

@ychae
Copy link
Contributor

ychae commented Jun 7, 2022

Jay's web dev team needs these features by 6/16

@andrewelamb
Copy link
Contributor

@allaway I'm still getting up to speed on how this whole workflow happens so hopefully I don't butcher this: @mialy-defelice and me looked through this and the tables we pull the data from need to have the howToAcquire column added.

@andrewelamb
Copy link
Contributor

@milen-sage @allaway Do the Funder and Donor tables need to be added by 6/16, or is the original ask good enough for that purpose?

@allaway
Copy link
Contributor

allaway commented Jun 8, 2022

@allaway I'm still getting up to speed on how this whole workflow happens so hopefully I don't butcher this: @mialy-defelice and me looked through this and the tables we pull the data from need to have the howToAcquire column added.

Done for the Cell Line , Animal Model, Antibody, Biobank, and Genetic Reagent "Resource" manifests here: https://drive.google.com/drive/u/0/folders/15LKpNqrWMvTrbJFI-eFNCm4T4U6VJa6_ !

@milen-sage @allaway Do the Funder and Donor tables need to be added by 6/16, or is the original ask good enough for that purpose?

It would be helpful to have these tables added to the join by 6/16.

@andrewelamb
Copy link
Contributor

@allaway could you give me access to that drive link?

@allaway
Copy link
Contributor

allaway commented Jun 8, 2022

Done @andrewelamb !

@andrewelamb
Copy link
Contributor

@allaway I'm getting a 404 error when trying that link, maybe it's not a permissions issue.

@allaway
Copy link
Contributor

allaway commented Jun 8, 2022

For some reason github doesn't include the underscore as part of the URL.

Try this

@andrewelamb
Copy link
Contributor

@allaway The Investigator -> Development -> Resource join is a many to many relationship (with Development making it 2 one to many relationships). I'm doing inner joins on these assuming you don't want Investigators with no Resources, or Resources with no Investigators, is that correct?

@allaway
Copy link
Contributor

allaway commented Jun 9, 2022

@andrewelamb - thanks for checking! I would like left joins (In Resource x Development / Development x Investigator if it is doable. We do want to keep all of the resources in the final table (i.e. we still want Resources with no Investigators). We don't need or want Investigators with no Resources, though.

@allaway allaway closed this as completed Jun 9, 2022
@allaway allaway reopened this Jun 9, 2022
@allaway
Copy link
Contributor

allaway commented Jun 9, 2022

Oops pressed the wrong button.

@andrewelamb
Copy link
Contributor

@allaway Same question for funders :)

@allaway
Copy link
Contributor

allaway commented Jun 9, 2022

Same thing! We don't want to eliminate any Resources based on missing info in other tables. So Resources without Funders should be included, but we don't care about Funders without Resources.

@andrewelamb
Copy link
Contributor

So to get to donor we have to go through animal model and/or cell line. How should we do that? I assume the joins are the same as above in regards to joins?

@milen-sage
Copy link
Collaborator Author

@andrewelamb I believe there are some example queries that do this kind of multi table join in the query config of the jupyter notebook @mialy-defelice had put together? Did you take a look at those?

@andrewelamb
Copy link
Contributor

@allaway Besides the "How To Acquire" column is there anything missing from this result:
Any columns you don't want?

database_result

@allaway
Copy link
Contributor

allaway commented Jun 9, 2022

Hey Andrew,

Sorry for not laying this issue out more clearly. We'd like these tables to be added to our "mega-join" table: https://www.synapse.org/#!Synapse:syn26438037/tables/

It's currently a join across: Resource_CellLine_AnimalModel_GeneticReagent_Antibody and we'd like to add in Investigator and Donor and Funder. I'm guessing @mialy-defelice has the rest of that join defined somewhere in the notebook (I haven't had a chance to look myself).

@andrewelamb
Copy link
Contributor

@allaway Do you want all the columns from investigators, funders and donors?

@allaway
Copy link
Contributor

allaway commented Jun 9, 2022

Yes, thanks!

@allaway
Copy link
Contributor

allaway commented Jun 10, 2022

@ychae , I think there may have been some miscommunication from our end - Jay's team will need more time to complete their work (probably mainly as it comes to https:/sagebionetworks.jira.com/browse/PORTALS-2178/ which requires implementing some new designs).

I am planning on creating a mock data table for them to use (just using joins in R), and then we can replace with the table that @andrewelamb is creating.

Thanks all!

@andrewelamb
Copy link
Contributor

@allaway The join is ready to go, I'm just dealing with permission issues getting the updated tables uploaded to synapse.

@allaway
Copy link
Contributor

allaway commented Jun 10, 2022

Oh, that's great! Thanks @andrewelamb !

I probably just need to add you to the project?

@allaway
Copy link
Contributor

allaway commented Jun 10, 2022

Oh nevermind you should already have admin access, so I'm guessing you are wrestling with a different permissions issue than I thought.

@mialy-defelice
Copy link
Contributor

@milen-sage I think they are two separate issues. I can't find another issue so I can create one.

The linked issue has to do with how the Portal decodes camelcase (for this one I needed to capitalize a letter manually to fix quick for Jay -- might only happen with the word Of).

The issue here has to do with how schematic creates camelcase vs how the RDB creates camelcase. This means we need to make things consistent so the RDB method matches the schematic method.

@allaway
Copy link
Contributor

allaway commented Jun 10, 2022

Thanks all. I am happy to revert the change of to->To - let me know what is preferred! :)

@allaway
Copy link
Contributor

allaway commented Jun 10, 2022

Ah I see this now: So your original version was 'correct'.

I will revert the change. But since both versions will be available thru git history, let me know which one you end up using!

@allaway
Copy link
Contributor

allaway commented Jun 13, 2022

Hi all: will the final table on Synapse have "howtoAcquire" or "howToAcquire" as the column header?

@ychae
Copy link
Contributor

ychae commented Jun 14, 2022

@allaway do you need it to be something specific?

@allaway
Copy link
Contributor

allaway commented Jun 14, 2022

It doesn't really matter, I can just change the column name on Synapse after it's uploaded to "howToAcquire" which is what Jay's team is expecting. :)

@mialy-defelice
Copy link
Contributor

@allaway did you manually convert that column name to camelcase? When making the manifest it actually shows up as the display name. I was under the impression the naming was coming directly from Schematic itself...

@allaway
Copy link
Contributor

allaway commented Jun 14, 2022

Sorry for the confusion. The manifests are the same manifests you generated, I just added this column. That error is on me. It should be changed to How to Acquire in the spreadsheet.

@allaway
Copy link
Contributor

allaway commented Jun 14, 2022

Apologies, I did not understand that the spreadsheets are what you all were talking about, I thought you were talking about the schema.

@mialy-defelice
Copy link
Contributor

It's no trouble, there are many places where camelcase discrepancies come into play...running into another one right now ( :( ). Hopefully we will get this up soon.

@mialy-defelice
Copy link
Contributor

@ychae @allaway @andrewelamb
The table is loaded!!!
https://www.synapse.org/#!Synapse:syn31956063/tables/
Robert let me know if there is something we need to update in the query.

@mialy-defelice
Copy link
Contributor

Also I enabled full text search and a bunch of facets but they may need to be added to or trimmed...

@allaway
Copy link
Contributor

allaway commented Jun 14, 2022

Thank you both! This is great. I am about to step out for a few hours but I will check this evening to let you know.

We may need to transfer the data to syn26438037, since that is the equivalent prod table and is the one flagged OPEN_ACCESS. I'm not sure we can get approval to flag a new entity open access before Friday :)

@mialy-defelice
Copy link
Contributor

Do you know how to do this easily? The ids for all the entries are not the same anymore so a simple update is not possible. The md5_id are generated from all the ID columns so since we have more Ids present the hashes will all be new.

@allaway
Copy link
Contributor

allaway commented Jun 15, 2022

I would do this manually by versioning the table (to save the old version), deleting all of the table rows without deleting the table itself, modify the schema as necessary (for new cols) and upload new rows. :) Does that make sense/am I missing something?

@mialy-defelice
Copy link
Contributor

That makes sense! That was my fear lol. I am always terrified of messing with the actual tables... I'll practice in my test project :) I'll update when the table is up.

@allaway
Copy link
Contributor

allaway commented Jun 15, 2022

Oh thank you! I was actually offering to do it, sorry for not being clearer. But I am happy to let you handle it!

@mialy-defelice
Copy link
Contributor

Okay its up!

@allaway
Copy link
Contributor

allaway commented Jun 15, 2022

Nice!! Thanks!

I think that the join is missing the Antibodies - see here: https://nf.synapse.org/Explore/Tools?QueryWrapper0=%7B%22sql%22%3A%22SELECT%20*%20FROM%20syn26438037%22%2C%22selectedFacets%22%3A%5B%5D%2C%22offset%22%3A0%7D

I'll take a closer look now to see if it is good otherwise!

@mialy-defelice
Copy link
Contributor

I think I found why it didn't load. I will work on updating it...so the table might disappear for a bit. Look for it in the version control.

@allaway
Copy link
Contributor

allaway commented Jun 15, 2022

Will do, thanks!

Other than that, the columns I expect to be there are there. I didn't see any other weird issues (e.g. missing data).

> library(synapser)
> library(tidyverse)
> synLogin()
Welcome, Robert Allaway!NULL
> 
> new <- synTableQuery("select * from syn26438037")$asDataFrame()
> old <- synTableQuery("select * from syn26438037.3")$asDataFrame()
> 
> ##expect
> setdiff(colnames(old), colnames(new))
character(0)
> 
> ##expect donor, funder, investigator columns
> setdiff(colnames(new), colnames(old))
 [1] "donorId"               "funderId"              "investigatorId"        "investigatorSynapseId" "parentDonorId"        
 [6] "howToAcquire"          "investigatorName"      "institution"           "orcid"                 "investigatorWebsite"  
[11] "funderName"            "race"                  "species"               "sex"                   "age"                  
> 
> ##I thought this should be 0 but I guess we added one... :) 
> setdiff(new$resourceName, old$resourceName)
[1] "NCC-MPNST3-X2"
> 
> ##expect 0 
> setdiff(old$resourceName, new$resourceName)
 [1] "Rabbit anti-NF1 Antibody, Affinity Purified"                                                     
 [2] "Anti-NF1 monoclonal antibody, clone TNJ-423"                                                     
 [3] "anti Neurofibromin antibody"                                                                     
 [4] "NF1 (phospho-Ser2741) antibody"                                                                  
 [5] "Anti-NF1 monoclonal antibody, clone OG-16"                                                       
 [6] "Anti-NF1 (C-terminal) polyclonal antibody"                                                       
 [7] "Anti-NF1 monoclonal antibody"                                                                    
 [8] "Anti-NF1 monoclonal antibody, clone MC-072"                                                      
 [9] "Rabbit anti Neurofibromin (Non-Phospho) antibody"                                                
[10] "Anti-NF1 antibody"                                                                               
[11] "Monoclonal Anti-NF1 antibody produced in mouse"                                                  
[12] "Monoclonal Anti-Neurofibromin antibody produced in mouse"                                        
[13] "Neurofibromin antibody"                                                                          
[14] "Rabbit anti Neurofibromin (Phospho-specific) antibody"                                           
[15] "Neurofibromin 1 Antibody"                                                                        
[16] "Anti-NF1 (aa 27-41) polyclonal antibody"                                                         
[17] "Anti-NF1 (aa 2719-2818) polyclonal antibody"                                                     
[18] "Neurofibromin (McNFn27a) antibody"                                                               
[19] "Rabbit Anti-Neurofibromin Polyclonal Antibody, Unconjugated"                                     
[20] "NF1 Polyclonal Antibody"                                                                         
[21] "NF1 Monoclonal Antibody (McNFn27a)"                                                              
[22] "Anti-Human NF1 Monoclonal Antibody, Unconjugated, Clone 3F3"                                     
[23] "Anti-NF1 (internal region) polyclonal antibody"                                                  
[24] "Rabbit Anti-Neurofibromin Antibody, Unconjugated"                                                
[25] "Rabbit Anti-NF1 Polyclonal, Unconjugated antibody"                                               
[26] "Mouse Anti-NF1 Monoclonal Antibody, Unconjugated, Clone McNFn27a"                                
[27] "NF1 Antibody"                                                                                    
[28] "Anti-NF1 antibody produced in rabbit"                                                            
[29] "NF1 Monoclonal Antibody (McNFn27b)"                                                              
[30] "Type 1 Neurofibromatosis Protein (NF1) Mouse anti-Human Monoclonal (aa27-41) (McNFn27) Antibody" 
[31] "Anti-Neurofibromin antibody"                                                                     
[32] "Neurofibromin 1 antibody"                                                                        
[33] "Neurofibromin 1 (D7R7D) Rabbit mAb antibody"                                                     
[34] "Type 1 Neurofibromatosis Protein (NF1) Rabbit Polyclonal (N-Terminus) Antibody"                  
[35] "Anti-NF1 polyclonal antibody"                                                                    
[36] "Neurofibromin (McNFn27b) antibody"                                                               
[37] "Neurofibromin Antibody (E-8)"                                                                    
[38] "Mouse Anti-NF1 Monoclonal Antibody, Unconjugated, Clone McNFn27"                                 
[39] "Type 1 Neurofibromatosis Protein (NF1) Mouse anti-Human Monoclonal (aa27-41) (McNFn27a) Antibody"
[40] "Rabbit anti Neurofibromin antibody"                                                              
[41] "Anti-NF1 monoclonal antibody, clone SOG517"                                                      
[42] "Anti-NF1 monoclonal antibody, clone NdOGo38c"                                                    

@mialy-defelice
Copy link
Contributor

Okay...I think it might finally all be there?

@allaway
Copy link
Contributor

allaway commented Jun 15, 2022

Looks good @mialy-defelice, @andrewelamb !

Thank you for the help with this! I think we are all set. I will connect with Jay and let you know if there are any other issues, but this looks good to me :)

> library(synapser)
> library(tidyverse)
> synLogin()
Welcome, Robert Allaway!NULL
> 
> new <- synTableQuery("select * from syn26438037")$asDataFrame()
Downloading  [####################]100.00%   103.5kB/103.5kB (681.0kB/s) SYNAPSE_TABLE_QUERY_97246230.csv Done...    > old <- synTableQuery("select * from syn26438037.3")$asDataFrame()
> 
> ##expect
> setdiff(colnames(old), colnames(new))
character(0)
> 
> ##expect donor, funder, investigator columns
> setdiff(colnames(new), colnames(old))
 [1] "donorId"               "funderId"              "investigatorId"        "investigatorSynapseId" "parentDonorId"        
 [6] "howToAcquire"          "investigatorName"      "institution"           "orcid"                 "investigatorWebsite"  
[11] "funderName"            "race"                  "species"               "sex"                   "age"                  
> 
> ##I thought this should be 0 but I guess we added one... :) 
> setdiff(new$resourceName, old$resourceName)
[1] "NCC-MPNST3-X2"
> 
> ##expect 0 
> setdiff(old$resourceName, new$resourceName)
character(0)

@allaway allaway closed this as completed Jun 15, 2022
@milen-sage
Copy link
Collaborator Author

In the context of the retrospective, it looks like a detail like this would have been helpful to include in the github issue from the very beginning?

-updated by @mialy-defelice
This may require update of the data model and regenerating the DB tables; or, just an update of join results (or both).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major a medium priority item short-term Timeline: 2-4 weeks
Projects
None yet
Development

No branches or pull requests

5 participants