-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schematic RDB: adding new columns to DB tables (example NF GFF) #757
Comments
Related to nf-osi/nf-research-tools-schema#13 |
I've "staged" the data temporarily here: https://www.synapse.org/#!Synapse:syn31002734/tables/ (in the database, We can either ingress the data properly into the database, or if need be and time is running short, can create a MaterializedView that includes a JOIN to this table to make the data available to platform devs. |
I'd like to slight modify the AC for this issue @milen-sage / @andrewelamb if that is OK. We have another issue https://sagebionetworks.jira.com/browse/NFINT-664 that we can resolve at the same time. In addition to the tables mentioned in the initial ticket, can we also please add the Funder and Donor tables in this join? These should all be 1-1 relationships so they should not cause any unexpected replication of tools. Thank you! |
@andrewelamb what's the state of this issue? |
Jay's web dev team needs these features by 6/16 |
@allaway I'm still getting up to speed on how this whole workflow happens so hopefully I don't butcher this: @mialy-defelice and me looked through this and the tables we pull the data from need to have the howToAcquire column added. |
@milen-sage @allaway Do the Funder and Donor tables need to be added by 6/16, or is the original ask good enough for that purpose? |
Done for the Cell Line , Animal Model, Antibody, Biobank, and Genetic Reagent "Resource" manifests here: https://drive.google.com/drive/u/0/folders/15LKpNqrWMvTrbJFI-eFNCm4T4U6VJa6_ !
It would be helpful to have these tables added to the join by 6/16. |
@allaway could you give me access to that drive link? |
Done @andrewelamb ! |
@allaway I'm getting a 404 error when trying that link, maybe it's not a permissions issue. |
For some reason github doesn't include the underscore as part of the URL. |
@allaway The Investigator -> Development -> Resource join is a many to many relationship (with Development making it 2 one to many relationships). I'm doing inner joins on these assuming you don't want Investigators with no Resources, or Resources with no Investigators, is that correct? |
@andrewelamb - thanks for checking! I would like left joins (In Resource x Development / Development x Investigator if it is doable. We do want to keep all of the resources in the final table (i.e. we still want Resources with no Investigators). We don't need or want Investigators with no Resources, though. |
Oops pressed the wrong button. |
@allaway Same question for funders :) |
Same thing! We don't want to eliminate any Resources based on missing info in other tables. So Resources without Funders should be included, but we don't care about Funders without Resources. |
So to get to donor we have to go through animal model and/or cell line. How should we do that? I assume the joins are the same as above in regards to joins? |
@andrewelamb I believe there are some example queries that do this kind of multi table join in the query config of the jupyter notebook @mialy-defelice had put together? Did you take a look at those? |
@allaway Besides the "How To Acquire" column is there anything missing from this result: |
Hey Andrew, Sorry for not laying this issue out more clearly. We'd like these tables to be added to our "mega-join" table: https://www.synapse.org/#!Synapse:syn26438037/tables/ It's currently a join across: Resource_CellLine_AnimalModel_GeneticReagent_Antibody and we'd like to add in Investigator and Donor and Funder. I'm guessing @mialy-defelice has the rest of that join defined somewhere in the notebook (I haven't had a chance to look myself). |
@allaway Do you want all the columns from investigators, funders and donors? |
Yes, thanks! |
@ychae , I think there may have been some miscommunication from our end - Jay's team will need more time to complete their work (probably mainly as it comes to https:/sagebionetworks.jira.com/browse/PORTALS-2178/ which requires implementing some new designs). I am planning on creating a mock data table for them to use (just using joins in R), and then we can replace with the table that @andrewelamb is creating. Thanks all! |
@allaway The join is ready to go, I'm just dealing with permission issues getting the updated tables uploaded to synapse. |
Oh, that's great! Thanks @andrewelamb ! I probably just need to add you to the project? |
Oh nevermind you should already have admin access, so I'm guessing you are wrestling with a different permissions issue than I thought. |
@milen-sage I think they are two separate issues. I can't find another issue so I can create one. The linked issue has to do with how the Portal decodes camelcase (for this one I needed to capitalize a letter manually to fix quick for Jay -- might only happen with the word Of). The issue here has to do with how schematic creates camelcase vs how the RDB creates camelcase. This means we need to make things consistent so the RDB method matches the schematic method. |
Thanks all. I am happy to revert the change of |
Ah I see this now: I will revert the change. But since both versions will be available thru git history, let me know which one you end up using! |
Hi all: will the final table on Synapse have "howtoAcquire" or "howToAcquire" as the column header? |
@allaway do you need it to be something specific? |
It doesn't really matter, I can just change the column name on Synapse after it's uploaded to "howToAcquire" which is what Jay's team is expecting. :) |
@allaway did you manually convert that column name to camelcase? When making the manifest it actually shows up as the display name. I was under the impression the naming was coming directly from Schematic itself... |
Sorry for the confusion. The manifests are the same manifests you generated, I just added this column. That error is on me. It should be changed to How to Acquire in the spreadsheet. |
Apologies, I did not understand that the spreadsheets are what you all were talking about, I thought you were talking about the schema. |
It's no trouble, there are many places where camelcase discrepancies come into play...running into another one right now ( :( ). Hopefully we will get this up soon. |
@ychae @allaway @andrewelamb |
Also I enabled full text search and a bunch of facets but they may need to be added to or trimmed... |
Thank you both! This is great. I am about to step out for a few hours but I will check this evening to let you know. We may need to transfer the data to syn26438037, since that is the equivalent prod table and is the one flagged OPEN_ACCESS. I'm not sure we can get approval to flag a new entity open access before Friday :) |
Do you know how to do this easily? The ids for all the entries are not the same anymore so a simple update is not possible. The md5_id are generated from all the ID columns so since we have more Ids present the hashes will all be new. |
I would do this manually by versioning the table (to save the old version), deleting all of the table rows without deleting the table itself, modify the schema as necessary (for new cols) and upload new rows. :) Does that make sense/am I missing something? |
That makes sense! That was my fear lol. I am always terrified of messing with the actual tables... I'll practice in my test project :) I'll update when the table is up. |
Oh thank you! I was actually offering to do it, sorry for not being clearer. But I am happy to let you handle it! |
Okay its up! |
Nice!! Thanks! I think that the join is missing the Antibodies - see here: https://nf.synapse.org/Explore/Tools?QueryWrapper0=%7B%22sql%22%3A%22SELECT%20*%20FROM%20syn26438037%22%2C%22selectedFacets%22%3A%5B%5D%2C%22offset%22%3A0%7D I'll take a closer look now to see if it is good otherwise! |
I think I found why it didn't load. I will work on updating it...so the table might disappear for a bit. Look for it in the version control. |
Will do, thanks! Other than that, the columns I expect to be there are there. I didn't see any other weird issues (e.g. missing data).
|
Okay...I think it might finally all be there? |
Looks good @mialy-defelice, @andrewelamb ! Thank you for the help with this! I think we are all set. I will connect with Jay and let you know if there are any other issues, but this looks good to me :)
|
In the context of the retrospective, it looks like a detail like this would have been helpful to include in the github issue from the very beginning?
|
NF GFF DB requires adding a few more fields to their DB results:
-updated by @mialy-defelice
This may require update of the data model and regenerating the DB tables; or, just an update of join results (or both).
This should be a good intro issue for working with scheamtic's RDB layer; compare and contrast with the setup of iAtlas.
AC: DB query results and corresponding Synapse tables for the GFF project have been successfully updated with the three fields above.
The text was updated successfully, but these errors were encountered: