Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why doesn't output directory genbank_genomes get written to config-specified outdir #79

Closed
taylorreiter opened this issue May 15, 2021 · 7 comments · Fixed by #130
Closed

Comments

@taylorreiter
Copy link
Member

using conf file:

sample:
- gather_vita_vars_gtdb_shared_assemblies
outdir: out
metagenome_trim_memory: 1e9
$ ls *
conf.yml  README.sh

genbank_genomes:
GCA_000210075.1_genomic.fna.gz  GCA_900543035.1.info.csv

out:
genbank  sgc

Shouldn't dir genbank_genomes be written to out?

@ctb
Copy link
Member

ctb commented May 15, 2021 via email

@taylorreiter
Copy link
Member Author

hmm I see that logic, but for me, I would make a project dir that I would cd into, then run genome-girst there, with the outdir as outputs. So unless I made a ~/genbank_genomes dir, a shared dir wouldn't make sense across projects. And a tool being presumptuous enough to make a ~/genbank_genomes dir would make me very...angry.

@ctb
Copy link
Member

ctb commented May 19, 2021

:) fair. I was thinking of making a cache dir, see #8, that could be shared across projects and/or users...

@ctb
Copy link
Member

ctb commented May 19, 2021

vaguely relevant - #9 - we need a way to specify a list of already existing genomes, from genbank and potentially from a private collection, too.

@taylorreiter
Copy link
Member Author

oooh I love cache dir idea, optionally available by a flag in the config file?

like:

cache_dir: ~/.genome-grist-cache

or something, and if not set, no cache dir and put everything in outputs?

@taylorreiter
Copy link
Member Author

have run into issue in my snakefile where not having a config-specifiable output dir is causing issues:

[Wed Jul 14 16:11:21 2021]
localcheckpoint touch_roary_filter_prefetch_shared_assemblies_vs_refseq:
    input: outputs/roary_prefetch/GCF_002834235.1_prefetch_filtered.csv
    output: outputs/roary_prefetch/.GCF_002834235.1_roary_dummy.txt
    jobid: 154
    wildcards: acc=GCF_002834235.1
Downstream jobs will be updated after completion.

Touching output file outputs/roary_prefetch/.GCF_002834235.1_roary_dummy.txt.
Updating job roary.
loaded 29 accessions from outputs/roary_prefetch/GCF_002834235.1_prefetch_filtered.csv.
loaded 54 accessions from outputs/genbank/gather_vita_vars_gtdb_shared_assemblies.x.genbank.gather.csv.
AmbiguousRuleException:
Rules download_shared_assemblies and roary_genome_grist are ambiguous for the file genbank_genomes/GCF_003462345.1_genomic.fna.gz.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
        download_shared_assemblies: acc=GCF_003462345.1
        roary_genome_grist: roary_acc=GCF_003462345.1
Expected input files:
        download_shared_assemblies: outputs/genbank/gather_vita_vars_gtdb_shared_assemblies.x.genbank.gather.csv inputs/genome-grist-conf.yml
        roary_genome_grist: outputs/roary_genome_grist_conf/roary_genome_grist_conf.ymlExpected output files:
        download_shared_assemblies: genbank_genomes/GCF_003462345.1_genomic.fna.gz
        roary_genome_grist: genbank_genomes/GCF_003462345.1_genomic.fna.gz
Removing output files of failed job touch_roary_filter_prefetch_shared_assemblies_vs_refseq since they might be corrupted:
outputs/roary_prefetch/.GCF_002834235.1_roary_dummy.txt
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/tereiter/github/2020-ibd/.snakemake/log/2021-07-14T161115.519517.snakemake.log

I get that its wasteful that this genome would be downloaded twice on the hard drive, but if I could specify the output dir I could work around the above issue

@ctb
Copy link
Member

ctb commented Jan 12, 2022

Fixing in #130 with new config parameter genbank_cache:

@ctb ctb closed this as completed in #130 Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants