-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ShoRAH: Illumina reads ; influenza #75
Comments
Hello Laura,
This error message means that when you try running ShoRAH, it can't manage to find one of its dependencies: the library libHTS (from samtools) against which it was compiled. Did you compile libHTS yourself? Is it also installed in Alternatively, I would strongly suggest that you consider installing the bioconda package as conda will automatically handle all dependencies for you and make whole installation experience much simpler. (e.g.: for our V-pipe pipeline, in the tutorials we recommand letting snakemake's built-in bioconda support to handle the package installation).
An example of a current run of ShoRAH in our V-pipe pipeline: /cluster/work/bewi/pangolin/shorah-test/test/bin/shorah shotgun -t 64 -a 0.1 -w 201 -x 100000 -c 0 -r NC_045512.2:1-29903 -R 42 -b REF_aln.bam -f cohort_consensus.fasta Explanation of options:
Please note that ShoRAH 2 has dropped the support for full global haplotype reconstruction (you would need to use ShoRAH 1 for that). It only reconstructs local haplotype within the windows. The .fasta file of this local haplotypes are available in the sub-directory
For each reconstructed local haplotype you get:
In the above example, I would definitely trust hap_0 (has a high probability, and is based on a large amount of reads), but not hap_1 (low posterior probability, few reads contributing to it). You could also have a look at the discussion in issue #66 Last important note: Please keep me informed of your successes. |
Thank you for the advice and the help! It seems to be working on my influenza samples so very nice! I installed shorah version 1.99.0 (with bioconda) so this is Shorah 1 right? However, I do not find the global haplotype or should this be another command? I putthe results from the Shorah for one sample here: https://we.tl/t-1KJDYbFzHA Thanks ! |
1.99.0 is our previous pre-release for ShoRAH2 BTW: Yesterday we released ShoRAH2 v1.99.1 which fixes a few minor bugs (posterior values '>1.0', unhandled exceptions from exp underflows), in addition to introducing the above mentionned It doesn't reconstruct the global haplotypes. Only the local haplotypes. Looking at your files (thank you for providing them), it seems that ShoRAH has crashed for some reasons: The
(see: doi:10.1101/2020.06.09.142919 where we have benchmarked the output of ShoRAH as part of our V-pipe. Section 3.3 of the paper seem to suggest that 10 is a good threshold for local haplotype's average reads). I would like to investigate the crash - as mentioned in my previous message, this is the first occasion I have to test a segmented virus genome. Would you accept sending me also your reference FASTA and the aligned BAM ? v1.1.3 seems to be the last ShoRAH1 that still has the mm.py command used for global haplotypes. But given that even the upgraded ShoRAH2 v1.99.0 crashes on your data, I wouldn't hold hopes high that the older ShoRAH1 v1.1.3 would successfully build haplotypes without crashing. What is the exact question you're trying to answer with ShoRAH?
If it is the later, analysing the local haplotype per windows could give a very good estimate of the diversity present in the sample. If it is the former, you might also want to explore other options. Our group has also developed Haploclique and QuasiRecomb as other tools which might help you in this direction. Haploclique is one of the engine for global Haplotype featured in V-pipe. The other being SAVAGE by J. Baaijens et al. |
I'm sorry, I thought the run was finished, but this was not the case. I put the actual results again in this link ([https://we.tl/t-t1stc9vhBp]) and I included the bam file and consensus fasta. But I think it has not stopped, I was a bit too quick. This sample is a mix of two reverse genetic viruses, one wild type and one with a mutation in the NA segment (antiviral resistance mutation E119V) that are present in a 80-20 ratio. So this is a bit our test sample as we know what should be in here (there could be some extra mutations due to growing in the cells but this should be limited). With LoFreq we were able to extract these mutations quite nicely with the "correct" ratios. Hereafter we used the same steps on "real" clinical H3N2 samples, so we already have a quite good idea of the diversity in the samples. In some samples we have a lot of different low frequency variants. So I would like to construct the sequences for the different quasispecies that are present in the sample so that a phylogenetic analysis can be run, to see if there are patients that are actually infected by two different strains and how different the low frequency quasispecies are from the consensus population. So therefore I think we need the global haplotype and not (only) the local haplotype. I will check then this last version. I'm also exploring other tools like Haploclique and QuasiRecomb, but here it is a bit the same "problem". There are not a lot of papers where they have been used (on influenza viruses) and often there is only limited information on how too choose appropriate parameters. So this makes it quite challenging to select the parameters so currently I'm running them with the default parameters. However, I would suspect that the results could be "better" if the parameters would be adapted. |
I have managed to run successfully shorah until the end (in |
You can also check the SNV themselves in the CSV or VCF files (
This first one is probably the SNV that cause the E119V amino acid substitution you have referred to, and that we see in windows 202-402, 269-469 and 336-536. We also have developed a visualization for SNV, but we need to adapt it a bit for your use case, we have developed it for SARS-CoV-2 and obviously it doesn't support segmented genomes (yet). (note: the Pval and Qval in the last column refer to the strand bias test, see the INFO field in the VCF file. Normally, it is expected to fail (=there is no bias) in normal paired end sequencing. Details in doi:10.1186/1471-2164-14-501). |
Dear,
I am very interest in using your tool for my project. I'm working on influenza reads that were sequenced with Illumina MiSeq. I was able to extract quasispecies with LoFreq, however LoFreq doesn't have the ability to construct haplotypes, therefore I turned to this tool. However, I tried running this, but I encounter always an error:
/usr/local/bin/b2w: error while loading shared libraries: libhts.so.2: cannot open shared object file: No such file or directory
I also tried this with the amplicon example that was given, but same error was given. I don't know if you have an idea what could be the cause of this?
Additionally, I also was wondering what parameters would be ideal as I don't find a lot of information. Based on a few papers I think that for my purpose following command should work:
shorah shotgun -b sample_sort.bam -f sample.fasta -a 0.01 -w 120 -s 3
However, I don't know if the other parameters should be set or if the default is sufficient?
Kind regards and thanks in advance
Laura
The text was updated successfully, but these errors were encountered: