Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Non-coding analysis for WES #70

Open
sawakof opened this issue Oct 24, 2024 · 4 comments
Open

Error in Non-coding analysis for WES #70

sawakof opened this issue Oct 24, 2024 · 4 comments

Comments

@sawakof
Copy link

sawakof commented Oct 24, 2024

Hi,

I am currently attempting to perform an association analysis on noncoding regions using WES data. However, while executing 379 array jobs, some of the jobs failed with an error. Could this issue be due to the fact that I am analyzing noncoding regions with exome data? I would greatly appreciate any suggestions or improvements you could provide.

$ apptainer exec staarpipeline_0.9.6.sif Rscript STAARpipeline-Tutorial/STAARpipeline_Gene_Centric_Noncoding_binomial.r 189
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 
6: Setting LC_PAPER failed, using "C" 
7: Setting LC_MEASUREMENT failed, using "C" 
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 263704 14.1     641854 34.3   435431 23.3
Vcells 444617  3.4    8388608 64.0  1753423 13.4
[1] 379
[1] 1
# of selected samples: 9,521
# of selected variants: 463
Error in rep(variant.id.SNV, variant_gene_num) : invalid 'times' argument
Calls: Gene_Centric_Noncoding -> noncoding
In addition: Warning message:
In (GENCODE.Category == "downstream") & (SNVlist) :
  longer object length is not a multiple of shorter object length
Execution halted
@xihaoli
Copy link
Owner

xihaoli commented Oct 24, 2024

Hi @sawakof,

Thank you for your message. Typically, one would not run Gene-Centric Noncoding analysis with exome data. This is because many of the defined noncoding masks could be "empty" and thus do not provide results. However, I think your error message is on Error in rep(variant.id.SNV, variant_gene_num) : invalid 'times' argument, which I believe is relevant to 3: Setting LC_TIME failed, using "C" and we haven't seen it before. Could you please trace back the issue of

During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 
6: Setting LC_PAPER failed, using "C" 
7: Setting LC_MEASUREMENT failed, using "C"

Thanks,
Xihao

@sawakof
Copy link
Author

sawakof commented Oct 25, 2024

Thank you for your quick response.
I believe that the locale issue during the startup of R is unrelated to the errors in the job itself.

During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 
6: Setting LC_PAPER failed, using "C" 
7: Setting LC_MEASUREMENT failed, using "C"

I understand that running Gene-Centric Noncoding analysis with exome data is typically not done because many of the defined noncoding masks could be "empty" and thus not provide results. Out of 381 noncoding array jobs, jobs numbered 189-249 and 298-320 could not be executed, but I was able to generate output data for the other numbers. Also, all ncRNA array jobs were able to produce outputs.

In this case, would it be acceptable to proceed with summarizing jobs using the subset of data that was successfully output?

@xihaoli
Copy link
Owner

xihaoli commented Oct 25, 2024

Hi @sawakof,

Thanks for your follow-up. A quick check shows that the error invalid 'times' argument in using rep() indicates that the times argument (in this case, variant_gene_num) is probably missing. If possible, could you please run through the lines of code and let me know what is the variant_gene_num in your case?

If a part of the array jobs can be successfully executed, then the output of these results is valid. You can proceed with summarizing jobs using the subset of data that was successfully output. However, it would be ideal to fix the array ids that have such issues and summarize all the jobs.

Best,
Xihao

@xihaoli
Copy link
Owner

xihaoli commented Oct 26, 2024

Hi @sawakof,

After another closer look, it seems that the number of variants in the genotype channel and the FunctionalAnnotation channel of your aGDS file are not the same, which caused the error. You may take a look at the dimensions of these fields in your opened aGDS file.

Hope this helps.

Best,
Xihao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants