Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple instances of SISTR on same machine can interfere with each other #18

Closed
apetkau opened this issue Mar 24, 2017 · 4 comments
Closed

Comments

@apetkau
Copy link
Member

apetkau commented Mar 24, 2017

I've found that if you run multiple instances of SISTR on the same machine, starting all of them at the exact same time, they can interfere with each other's results.

For example, running:

for i in {1..2}; do sistr -f csv -o predictions_$i AE014613.fasta 2> $i.err 1> $i.out & done

Will produce the following in the stderr files:

...
2017-03-24 14:25:36,982 ERROR: Missing cgmlst_results for NZ_AOXE01000004.1_101 [in /home/aaron/miniconda2/lib/python2.7/site-packages/sistr_cmd-0.3.4-py2.7.egg/sistr/src/cgmlst/__init__.py:357]
2017-03-24 14:25:36,982 ERROR: Missing cgmlst_results for NZ_AOXE01000008.1_59 [in /home/aaron/miniconda2/lib/python2.7/site-packages/sistr_cmd-0.3.4-py2.7.egg/sistr/src/cgmlst/__init__.py:357]
2017-03-24 14:25:36,983 ERROR: Missing cgmlst_results for NZ_AOXE01000053.1_113 [in /home/aaron/miniconda2/lib/python2.7/site-packages/sistr_cmd-0.3.4-py2.7.egg/sistr/src/cgmlst/__init__.py:357]
/home/aaron/miniconda2/lib/python2.7/site-packages/sistr_cmd-0.3.4-py2.7.egg/sistr/src/cgmlst/__init__.py:293: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
2017-03-24 14:25:38,061 ERROR: blastn on db AE014613_fasta and query wzy.fasta did not produce expected output file at /tmp/20170324142534-SISTR-AE014613/wzy.fasta-AE014613_fasta-2017Mar24_14_25_37.blast [in /home/aaron/miniconda2/lib/python2.7/site-packages/sistr_cmd-0.3.4-py2.7.egg/sistr/src/blast_wrapper/__init__.py:125]
Traceback (most recent call last):
  File "/home/aaron/miniconda2/bin/sistr", line 11, in <module>
    load_entry_point('sistr-cmd==0.3.4', 'console_scripts', 'sistr')()
  File "/home/aaron/miniconda2/lib/python2.7/site-packages/sistr_cmd-0.3.4-py2.7.egg/sistr/sistr_cmd.py", line 320, in main
  File "/home/aaron/miniconda2/lib/python2.7/site-packages/sistr_cmd-0.3.4-py2.7.egg/sistr/sistr_cmd.py", line 221, in sistr_predict
  File "/home/aaron/miniconda2/lib/python2.7/site-packages/sistr_cmd-0.3.4-py2.7.egg/sistr/src/blast_wrapper/__init__.py", line 130, in cleanup
  File "/home/aaron/miniconda2/lib/python2.7/shutil.py", line 239, in rmtree
    onerror(os.listdir, path, sys.exc_info())
  File "/home/aaron/miniconda2/lib/python2.7/shutil.py", line 237, in rmtree
    names = os.listdir(path)
OSError: [Errno 2] No such file or directory: '/tmp/20170324142534-SISTR-AE014613'

This error does not occur if only running one instance at a time. I'm guessing each instance is interfering with each other's tmp files.

@peterk87
Copy link
Contributor

Yes, the error is definitely occurring due to the same tmp directory being created and used by each instance in that case. One instance completes before the other cleaning up the tmp directory.

Would there be a scenario where files with the same base filename are run at the same time?

A potential workaround would be to distinguish different input files by providing a genome_name along with the path to the input fasta using the -i arg:

for i in {1..2}; do 
  sistr -f csv -o predictions_$i \
    -i /path/to/AE014613.fasta <genome_name>_$i \
    2> $i.err 1> $i.out & done

This should produce tmp dirs:

/tmp/<timestamp>-SISTR-<genome_name>_1
/tmp/<timestamp>-SISTR-<genome_name>_2

Or you could specify different base tmp directories to produce the output files in.

I could add a condition to the tmp dir creation to check if the directory already exists, and if so, create a tmp dir with a slightly different name (e.g. append _<number>).

@apetkau
Copy link
Member Author

apetkau commented Mar 24, 2017

Hmmm... with the current setup I have the files are named the same as I do an assembly first, so the file becomes something like contigs.fasta.

The scenario I'm thinking of is automatically running SISTR on upload of sequencing data from a sequencing run. However, in general, they probably won't all run at the same time, except for my small test data.

I do think it's something to fix up though, either through your suggesting, or by using one of the tempfile functions (which will assign just random names).

@peterk87
Copy link
Contributor

Okay, I'll work up a fix and a new release with the check on tmp dir creation.

In the scenario you describe, would you be able to provide a genome name (or some kind of unique and useful identifier) to your input fasta? You could keep it as /path/to/contigs.fasta but also supply a genome_name, e.g.

sistr -o output -i /path/to/contigs.fasta genome_1337

So the SISTR output would show the name as genome_1337 which might be useful in the other output files like the cgMLST profile output or the detailed cgMLST allele search results.

@apetkau
Copy link
Member Author

apetkau commented Mar 24, 2017

Awesome, thanks :)

Yes, I'll also look at giving the genomes passed to SISTR a better name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants