-
Notifications
You must be signed in to change notification settings - Fork 185
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* [fix] (template): Missing code in wrappers' doc. Error #187 * update salmon version, wrappers and documentation * update salmon index, since RepMap indexes are not accepted anymore * clean dev pipes * snakefmt changes * removed direct reference to resources for #482 (comment) * Use of f-strings and implicit string to bool conversion #482 (comment) * List all files through multiext #482 (comment) * snakefmt trailing comma addition * accept salmon index file list * salmon index wrapper now accepts either a list of files, or a single file * salmon quand now accepts either an index dir or a list of files * salmon quant now accepts gzipped files and raw fastq files automatically * bz2 support and threading error * formatting * adding bzip2 and gzip support in environment.yaml * Remove unnecessary line #482 (comment) * remove remaining dev print Co-authored-by: tdayris <[email protected]>
- Loading branch information
Showing
38 changed files
with
352 additions
and
122 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,4 +3,4 @@ channels: | |
- conda-forge | ||
- defaults | ||
dependencies: | ||
- salmon ==0.14.1 | ||
- salmon ==1.8.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,14 @@ | ||
name: salmon_index | ||
url: https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode | ||
description: | | ||
Index a transcriptome assembly with salmon | ||
Index a transcriptome assembly with salmon | ||
authors: | ||
- Tessa Pierce | ||
- Thibault Dayris | ||
input: | ||
- assembly fasta | ||
- sequences: Path to sequences to index with Salmon. This can be transcriptome sequences or gentrome. | ||
- decoys: Optional path to decoy sequences name, in case the above `sequence` was a gentrome. | ||
output: | ||
- indexed assembly | ||
- indexed assembly | ||
params: | ||
- extra: Optional parameters besides `--tmpdir`, `--threads`, and IO. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,30 @@ | ||
rule salmon_index: | ||
input: | ||
"assembly/transcriptome.fasta" | ||
sequences="assembly/transcriptome.fasta", | ||
output: | ||
directory("salmon/transcriptome_index") | ||
multiext( | ||
"salmon/transcriptome_index/", | ||
"complete_ref_lens.bin", | ||
"ctable.bin", | ||
"ctg_offsets.bin", | ||
"duplicate_clusters.tsv", | ||
"info.json", | ||
"mphf.bin", | ||
"pos.bin", | ||
"pre_indexing.log", | ||
"rank.bin", | ||
"refAccumLengths.bin", | ||
"ref_indexing.log", | ||
"reflengths.bin", | ||
"refseq.bin", | ||
"seq.bin", | ||
"versionInfo.json", | ||
), | ||
log: | ||
"logs/salmon/transcriptome_index.log" | ||
"logs/salmon/transcriptome_index.log", | ||
threads: 2 | ||
params: | ||
# optional parameters | ||
extra="" | ||
extra="", | ||
wrapper: | ||
"master/bio/salmon/index" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
rule salmon_index: | ||
input: | ||
sequences="assembly/transcriptome.fasta", | ||
output: | ||
directory("salmon/transcriptome_index/"), | ||
log: | ||
"logs/salmon/transcriptome_index.log", | ||
threads: 2 | ||
params: | ||
# optional parameters | ||
extra="", | ||
wrapper: | ||
"master/bio/salmon/index" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,12 +5,29 @@ | |
__email__ = "[email protected]" | ||
__license__ = "MIT" | ||
|
||
from os.path import dirname | ||
from snakemake.shell import shell | ||
from tempfile import TemporaryDirectory | ||
|
||
log = snakemake.log_fmt_shell(stdout=True, stderr=True) | ||
extra = snakemake.params.get("extra", "") | ||
|
||
shell( | ||
"salmon index -t {snakemake.input} -i {snakemake.output} " | ||
" --threads {snakemake.threads} {extra} {log}" | ||
) | ||
decoys = snakemake.input.get("decoys", "") | ||
if decoys: | ||
decoys = f"--decoys {decoys}" | ||
|
||
output = snakemake.output | ||
if len(output) > 1: | ||
output = dirname(snakemake.output[0]) | ||
|
||
with TemporaryDirectory() as tempdir: | ||
shell( | ||
"salmon index " | ||
"--transcripts {snakemake.input.sequences} " | ||
"--index {output} " | ||
"--threads {snakemake.threads} " | ||
"--tmpdir {tempdir} " | ||
"{decoys} " | ||
"{extra} " | ||
"{log}" | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,4 +3,6 @@ channels: | |
- conda-forge | ||
- defaults | ||
dependencies: | ||
- salmon ==0.14.1 | ||
- salmon ==1.8.0 | ||
- gzip ==1.11 | ||
- bzip2 ==1.0.8 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,23 @@ | ||
name: salmon_quant | ||
name: salmon quant | ||
url: https://salmon.readthedocs.io/en/latest/salmon.html#quantifying-in-mapping-based-mode | ||
description: | | ||
Quantify transcripts with salmon | ||
Quantify transcripts with salmon | ||
authors: | ||
- Tessa Pierce | ||
- Thibault Dayris | ||
input: | ||
- assembly index, fastq files | ||
- index: Path to Salmon indexed sequences, see `bio/salmon/index` | ||
- gtf: Optional path to a GTF formatted genome annotation | ||
- r: Path to unpaired reads | ||
- r1: Path to upstream reads file. | ||
- r2: Path to downstream reads file. | ||
output: | ||
- quantification files | ||
- Path to quantification file | ||
- bam: Path to pseudo-bam file | ||
params: | ||
- libType: Format string describing the library type, see `official documentation on Library Types <https://salmon.readthedocs.io/en/latest/library_type.html>`_ for list of accepted values. | ||
- extra: Optional command line parameters, besides IO parameters and threads. | ||
notes: | | ||
Salmon accepted either a list of unpaired reads (`r` parameter), or two lists | ||
of the same length containing paired reads (`r1` and `r2` parameters). Not | ||
both. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
rule salmon_quant_reads: | ||
input: | ||
# If you have multiple fastq files for a single sample (e.g. technical replicates) | ||
# use a list for r1 and r2. | ||
r1="reads/{sample}_1.fq.gz", | ||
r2="reads/{sample}_2.fq.gz", | ||
index=multiext( | ||
"salmon/transcriptome_index/", | ||
"complete_ref_lens.bin", | ||
"ctable.bin", | ||
"ctg_offsets.bin", | ||
"duplicate_clusters.tsv", | ||
"info.json", | ||
"mphf.bin", | ||
"pos.bin", | ||
"pre_indexing.log", | ||
"rank.bin", | ||
"refAccumLengths.bin", | ||
"ref_indexing.log", | ||
"reflengths.bin", | ||
"refseq.bin", | ||
"seq.bin", | ||
"versionInfo.json", | ||
), | ||
output: | ||
quant="salmon/{sample}/quant.sf", | ||
lib="salmon/{sample}/lib_format_counts.json", | ||
log: | ||
"logs/salmon/{sample}.log", | ||
params: | ||
# optional parameters | ||
libtype="A", | ||
extra="", | ||
threads: 2 | ||
wrapper: | ||
"master/bio/salmon/quant" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,16 @@ | ||
rule salmon_quant_reads: | ||
input: | ||
r = "reads/{sample}.fq.gz", | ||
index = "salmon/transcriptome_index" | ||
r="reads/{sample}.fq.gz", | ||
index="salmon/transcriptome_index", | ||
output: | ||
quant = 'salmon/{sample}_x_transcriptome/quant.sf', | ||
lib = 'salmon/{sample}_x_transcriptome/lib_format_counts.json' | ||
quant="salmon/{sample}_x_transcriptome/quant.sf", | ||
lib="salmon/{sample}_x_transcriptome/lib_format_counts.json", | ||
log: | ||
'logs/salmon/{sample}_x_transcriptome.log' | ||
"logs/salmon/{sample}_x_transcriptome.log", | ||
params: | ||
# optional parameters | ||
libtype ="A", | ||
#zip_ext = bz2 # req'd for bz2 files ('bz2'); optional for gz files('gz') | ||
extra="" | ||
libtype="A", | ||
extra="", | ||
threads: 2 | ||
wrapper: | ||
"master/bio/salmon/quant" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
rule salmon_quant_reads: | ||
input: | ||
r="reads/{sample}.fq.bz2", | ||
index="salmon/transcriptome_index", | ||
output: | ||
quant="salmon/{sample}_x_transcriptome/quant.sf", | ||
lib="salmon/{sample}_x_transcriptome/lib_format_counts.json", | ||
log: | ||
"logs/salmon/{sample}_x_transcriptome.log", | ||
params: | ||
# optional parameters | ||
libtype="A", | ||
extra="", | ||
threads: 2 | ||
wrapper: | ||
"master/bio/salmon/quant" |
Binary file not shown.
Binary file added
BIN
+16 Bytes
bio/salmon/quant/test/salmon/transcriptome_index/complete_ref_lens.bin
Binary file not shown.
Binary file not shown.
Binary file not shown.
2 changes: 1 addition & 1 deletion
2
bio/salmon/quant/test/salmon/transcriptome_index/duplicate_clusters.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
RetainedTxp DuplicateTxp | ||
RetainedRef DuplicateRef |
Binary file not shown.
14 changes: 0 additions & 14 deletions
14
bio/salmon/quant/test/salmon/transcriptome_index/header.json
This file was deleted.
Oops, something went wrong.
2 changes: 0 additions & 2 deletions
2
bio/salmon/quant/test/salmon/transcriptome_index/indexing.log
This file was deleted.
Oops, something went wrong.
22 changes: 22 additions & 0 deletions
22
bio/salmon/quant/test/salmon/transcriptome_index/info.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
{ | ||
"index_version": 4, | ||
"reference_gfa": [ | ||
"transcriptome_index" | ||
], | ||
"sampling_type": "dense", | ||
"k": 31, | ||
"num_kmers": 352, | ||
"num_contigs": 2, | ||
"seq_length": 412, | ||
"have_ref_seq": true, | ||
"have_edge_vec": false, | ||
"SeqHash": "8957140ad649436f3db7111f5a1cea7cf5e8ee72600f26443d3861b5f0894325", | ||
"NameHash": "7733b4bd4d5a14d60999c280918c82dc8d1f7cfdd24764e8eef54a4bb30a51a3", | ||
"SeqHash512": "89a7e74f55209605a4fe0823821c8dfbedebcb2639fba589afed3af583c8158d01cafff5ceb5e63d3b95c3635e937869a6d55c67d748d6f5e3ae1aa53fd5ba4b", | ||
"NameHash512": "454d8e37dceb2f27b460b46f3d4724f5cca0b5bd29abe8493484846a759cf7e71db43da5cd7f4afbdb17ce12d46faa4c3326dc795dd1900df0995eb53dceb695", | ||
"DecoySeqHash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", | ||
"DecoyNameHash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", | ||
"num_decoys": 0, | ||
"first_decoy_index": 18446744073709551615, | ||
"keep_duplicates": false | ||
} |
Binary file not shown.
Binary file not shown.
3 changes: 3 additions & 0 deletions
3
bio/salmon/quant/test/salmon/transcriptome_index/pre_indexing.log
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[2022-04-29 11:07:36.254] [jLog] [warning] The salmon index is being built without any decoy sequences. It is recommended that decoy sequence (either computed auxiliary decoy sequence or the genome of the organism) be provided during indexing. Further details can be found at https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode. | ||
[2022-04-29 11:07:36.254] [jLog] [info] building index | ||
[2022-04-29 11:07:36.296] [jLog] [info] done building index |
12 changes: 0 additions & 12 deletions
12
bio/salmon/quant/test/salmon/transcriptome_index/quasi_index.log
This file was deleted.
Oops, something went wrong.
Binary file not shown.
Binary file added
BIN
+24 Bytes
bio/salmon/quant/test/salmon/transcriptome_index/refAccumLengths.bin
Binary file not shown.
5 changes: 0 additions & 5 deletions
5
bio/salmon/quant/test/salmon/transcriptome_index/refInfo.json
This file was deleted.
Oops, something went wrong.
28 changes: 28 additions & 0 deletions
28
bio/salmon/quant/test/salmon/transcriptome_index/ref_indexing.log
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
[2022-04-29 11:07:36.254] [puff::index::jointLog] [info] Running fixFasta | ||
[2022-04-29 11:07:36.255] [puff::index::jointLog] [info] Replaced 0 non-ATCG nucleotides | ||
[2022-04-29 11:07:36.255] [puff::index::jointLog] [info] Clipped poly-A tails from 0 transcripts | ||
[2022-04-29 11:07:36.256] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers | ||
[2022-04-29 11:07:36.256] [puff::index::jointLog] [info] ntHll estimated 47404 distinct k-mers, setting filter size to 2^20 | ||
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Starting the Pufferfish indexing by reading the GFA binary file. | ||
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Setting the index/BinaryGfa directory transcriptome_index | ||
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Done wrapping the rank vector with a rank9sel structure. | ||
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] contig count for validation: 2 | ||
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Total # of Contigs : 2 | ||
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Total # of numerical Contigs : 2 | ||
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Total # of contig vec entries: 2 | ||
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] bits per offset entry 2 | ||
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Done constructing the contig vector. 3 | ||
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] # segments = 2 | ||
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] total length = 412 | ||
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Reading the reference files ... | ||
[2022-04-29 11:07:36.269] [puff::index::jointLog] [info] positional integer width = 9 | ||
[2022-04-29 11:07:36.269] [puff::index::jointLog] [info] seqSize = 412 | ||
[2022-04-29 11:07:36.269] [puff::index::jointLog] [info] rankSize = 412 | ||
[2022-04-29 11:07:36.269] [puff::index::jointLog] [info] edgeVecSize = 0 | ||
[2022-04-29 11:07:36.269] [puff::index::jointLog] [info] num keys = 352 | ||
[2022-04-29 11:07:36.295] [puff::index::jointLog] [info] mphf size = 0.000961304 MB | ||
[2022-04-29 11:07:36.295] [puff::index::jointLog] [info] chunk size = 412 | ||
[2022-04-29 11:07:36.295] [puff::index::jointLog] [info] chunk 0 = [0, 382) | ||
[2022-04-29 11:07:36.295] [puff::index::jointLog] [info] finished populating pos vector | ||
[2022-04-29 11:07:36.295] [puff::index::jointLog] [info] writing index components | ||
[2022-04-29 11:07:36.296] [puff::index::jointLog] [info] finished writing dense pufferfish index |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
5 changes: 3 additions & 2 deletions
5
bio/salmon/quant/test/salmon/transcriptome_index/versionInfo.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
{ | ||
"indexVersion": 4, | ||
"indexVersion": 5, | ||
"hasAuxIndex": false, | ||
"auxKmerLength": 31, | ||
"indexType": 1 | ||
"indexType": 2, | ||
"salmonVersion": "1.8.0" | ||
} |
Oops, something went wrong.