-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too few IDs in DROP_GROUP when combined with public data resource, mae and aberrantExpression #193
Comments
Hi, thanks for using DROP. I think there are a couple of issues, which I hope can help you get things working. We are currently working to automatically run the correct snakemake commands based on how the user defines the config, but currently that is not the case. The default is to run all of the modules (aberrant splicing, aberrant expression, and monoallelic expression). So if you want to only run aberrant expression and monoallelic expression you will need to separately run them by running:
Please let me know if any of that made sense, and better yet if it was able to help you. If not, please let us know what was confusing or what continues to not work the way you would expect. Best of luck! |
Thank you so much for getting back to me. I am focusing just on aberrantExpression but still having some trouble. I addressed your suggestions below. yaml and sample annotation file attached
I am still having the following problem...
Is it a problem that my files were aligned to ensembl and the control data was aligned to gencode? I can realign if it would fix the problem. |
Hi, can you please rename the groups in the aberrantSplicing and mae dictionaries from |
Here is my snakefile in case its helpful. Thanks. |
Same problem...
|
I know its getting passed the submodules/AberrantExpression.py because if I only have 9 samples, I get the following error
When I add a 10th row, it errors in the aberrantsplicing module as above |
Just looking in the python code, the aberrant expression module has this line check number of IDs per group
while the aberrantsplicing module has this |
if I comment out line 17 in AberrantSplicing.py "self.checkSubset(self.rnaIDs)", I make it through to the aberrantExpression module. Currently getting some errors there, which I'll investigate. Thanks. |
Thank you for using this, and pushing our code to its limits! This is definitely a big problem, especially if users don't want to run a specific module. We will push forward in fixing this ASAP In the mean time though, what I've done is remove the minimum required number of samples for each module in a new branch If you pull the changes from there and install them (I just changed line 33 of drop/config/submodules/Submodules.py As for your sample annotation and config set up. In the config you should be able to set the aberrantSplicing group to then when you run And then |
Thank you so much. I really appreciate your time. I apologize, I'm not very savy with Git and I don't want to mess things up since the package seems to be managed by conda.
Thank you! |
No worries.
|
The folder /PATH/Output/processed_data/aberrant_expression/ensembl87/ is not created. This folder /PATH/Output/processed_data/aberrant_expression/gencode34/ holds all of the output including for DJ00* despite the fact that the DJ00* files were created with ensembl87. I believe I've included all of the correct keys in the yml and sample annotation files but the processed_data step still sends the DJ00* files to the gencode34 folder instead of ensembl87. I've attached the log file. Thanks! Thanks |
Thanks for dealing with this, within an aberrantExpression group you must use the same gene_annotation file. The problem is that when you try to merge 2 samples together, if there are 2 samples where one is using versionA.gtf and the other is using versionB.gtf it won't work. I think the simplest would be to change it so that you use the same sample annotation file for each sample. If however you wanted to treat each sample annotation set as a distinct group, and run the ensembl87 as 1 group, and the gencode34 as another you would need to make those 2 distinct aberrantExpression groups in the config and DROP_GROUP columns. Unfortunately the sampleParams set up that is trying to keep track of the values in the sample annotation file in order to rerun dependent jobs if the sample annotation was changed in a meaningful way was only designed to be run with a single annotation file. If you do not want to configure your data to use a single gene_annotation file ( This will create the missing sampleParams files which will then be correctly discovered and used for the downstream steps. |
Thank you for working through this with me. I'm happy to use the same gene annotation file. I think that means I should re-align my samples to gencode34, correct? I'm happy to do that. Could you provide the link to the recommended gencode .fa file? For the moment, while I'm getting started, I intend to use the control dataset that you've suggested. In the future, I plan to have adequate samples sequenced internally. Is there a gene annotation that you'd recommend if you were starting today? Most of my samples will probably be in Hg19/GrCH37. Thanks! |
Since the majority of your samples already are counted using gencode (which we use as well). I think all you need to do is for the 4 bam files allow those to be counted using the gencode annotation file. set your config.yaml file to look like this (containing only 1 gene_annotation value):
and in the sample annotation tsv file: That should work (hopefully) |
Working so far! I'll let you know I did encounter this warning...
"/PATH/DJ002/Aligned.sorted.bam.bai" exists in the same folder. Is there a reason its not recognizing the index? Does it matter? Thanks! |
I'm glad to hear it's doing something! Let me know how it goes and how it looks at the end, hopefully this works! |
Unfortunately another error! I've attached the error message... Maybe its still a problem that the external matrix was done using gencode while my files were aligned using ensembl? Thanks! |
I re-aligned my samples to gencode v34 and still get the above error. /PATH/.snakemake/scripts is empty. I may have found part of the error. I looked at the count matrices for my files (DJ00*) and compared them to the external count matrices. My files have 62492 transcripts. The external count matrix has 62456 transcripts. I've attached a list of the transcripts in my data that don't exist in the external counts. I haven't looked at every single one but they seem to be transcripts for mitochondrial genes. Is there anyway to only analyze the intersection of the transcripts? Thanks |
**On second thought. The way the merging is done, the data frame used internally takes all of the transcripts from the gtf file and then starts counting whether or not a read falls within the transcript bounds. So this means that the external counts must have been assembled using a different gtf file version than the one provided in the config, it seems like the gtf for the external counts has removed the chromosome MT from gencode.v38lift37.annotation.gtf The hot fix is to remove the chromosome MT from the gtf file, but we are in contact with the provider of the external counts for a more permanent solution. |
Thanks again for working through this with me. I've got some steps forward and steps back...
Thanks! |
Hi, great to see that it's working with the STAR aligned files. |
I was actually looking at the gtf you provided in your config file. You wrote in your original config.yaml file:
If the external counts were gencode v34, and you have gencode v38 that could certainly create some confusion. But maybe you've changed the config since you posted the example. I'm glad to hear things are running, let us know how things go. |
I subsequently changed to v34 after posting the original YAML and
unfortunately it didn't help. But thank you for checking. I'll keep you
updated. Thanks!
…On Tue, May 18, 2021 at 10:59 AM nickhsmith ***@***.***> wrote:
I was actually looking at the gtf you provided in your config file. You
wrote in your config.yaml file:
geneAnnotation:
gencode34: /PATH/reference_genomes/hg37_gencode/gencode.v38lift37.annotation.gtf
If the external counts were gencode v34, and you have gencode v38 that
could certainly create some confusion.
I'm glad to hear things are running, let us know how things go.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#193 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEGB3YNJNO4PRMWUP7MKQTTOJ6GPANCNFSM4456V5AQ>
.
--
-------------------------------------------------
Joshua E. Motelow, MD, PhD
Fellow, Pediatric Critical Care Medicine
Morgan Stanley Children's Hospital of NewYork-Presbyterian
pager: 82848
-------------------------------------------------
The information contained in this message may be privileged and
confidential. If you are NOT the intended recipient, please notify the
sender immediately and destroy this message.
Please be aware that email communication can be intercepted in transmission
or misdirected. Your use of email to communicate protected health
information to us indicates that you acknowledge and accept the possible
risks associated with such communication. Please consider communicating any
sensitive information by telephone, fax or mail. If you do not wish to have
your information sent by email, please contact the sender immediately.
-------------------------------------------------
|
OK! I made it through! Thank you for all of your help! A few outstanding issues but I think we should close this issue because I'm not sure I'll have time to follow up in the near future.
Ok, thank you! I'm sure I'll have more questions in the future. |
Hello,
I'm having a little trouble running DROP with a few of my samples combined with the external blood group. I know there's an open ticket for splicing (#154) but I think other users have the other analyses working. I seem to get an error in the AberrantSplicing module but I don't think I call it.
I've attached my sample annotation file and yaml file.
Thank you!
sample_annotation.txt
config.yaml.txt
The text was updated successfully, but these errors were encountered: