Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: The task could not be sent to the workers as it is too large for send_bytes. #119

Closed
novitch opened this issue Sep 20, 2019 · 12 comments
Assignees
Labels

Comments

@novitch
Copy link

novitch commented Sep 20, 2019

Hi I would like to use the soft in multi-threading but when i tried I ran into an issue:
Here is the complete stdout:

INFO:iss.app:Starting iss generate
INFO:iss.app:Using kde ErrorModel
INFO:iss.app:Setting random seed to 110803
INFO:iss.util:Stitching input files together
INFO:iss.app:Using zero_inflated_lognormal abundance distribution
INFO:iss.app:Using 10 cpus for read generation
INFO:iss.app:Generating 1000000 reads
INFO:iss.app:Generating reads for record: GCA_000710275.1_ASM71027v1_genomic.fna
INFO:iss.app:Generating reads for record: GCA_001600775.1_JCM_11348_assembly_v001_genomic.fna
INFO:iss.app:Generating reads for record: GCA_001890705.1_Aspsy1_genomic.fna
INFO:iss.app:Generating reads for record: GCA_002551515.1_Malafurf_genomic.fna
INFO:iss.app:Generating reads for record: GCA_002901145.1_ASM290114v1_genomic.fna
INFO:iss.app:Generating reads for record: GCF_000001405.39_GRCh38.p13_genomic.fna
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/matalb01/virtual-envs/iss_36/lib/python3.6/site-packages/joblib/externals/loky/backend/queues.py", line 156, in _feed
    send_bytes(obj_)
  File "/opt/intel/intelpython3/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
 File "/opt/intel/intelpython3/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/matalb01/virtual-envs/iss_36/bin/iss", line 10, in <module>
    sys.exit(main())
  File "/home/matalb01/virtual-envs/iss_36/lib/python3.6/site-packages/iss/app.py", line 510, in main
    args.func(args)
  File "/home/matalb01/virtual-envs/iss_36/lib/python3.6/site-packages/iss/app.py", line 226, in generate_reads
    args.gc_bias) for i in range(cpus))
  File "/home/matalb01/virtual-envs/iss_36/lib/python3.6/site-packages/joblib/parallel.py", line 934, in __call__
    self.retrieve()
  File "/home/matalb01/virtual-envs/iss_36/lib/python3.6/site-packages/joblib/parallel.py", line 833, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/matalb01/virtual-envs/iss_36/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 521, in wrap_future_result
    return future.result(timeout=timeout)
  File "/opt/intel/intelpython3/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
 File "/opt/intel/intelpython3/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
RuntimeError: The task could not be sent to the workers as it is too large for `send_bytes`.
@novitch
Copy link
Author

novitch commented Sep 20, 2019

If you have a suggestion I would like to test..
I am working on a linux server with 1To of ram and 120 cpu.. I tried between 10 and 120 cpu and obtain the same error message.
thanks
Alban.

@HadrienG
Copy link
Owner

Hi!

Thanks for reporting this. Could you share with me:

  • the exact command you used
  • the size of GCF_000001405.39_GRCh38.p13_genomic.fna?

The error message indicates that some data object is too large to be passed around by the multiprocessing library. I haven't tested InSilicoSeq on large(ish) eukaryotes, so it is possible the human genome is too big for the default data type used by multiprocessing

I'll test on my side, but in the meantime you can try again with removing the human genome from your input dataset.

Best,
Hadrien

@HadrienG HadrienG added the bug label Sep 20, 2019
@HadrienG HadrienG self-assigned this Sep 20, 2019
@novitch
Copy link
Author

novitch commented Sep 20, 2019

Hi Hadrien,
The file is 3,1G .. I'll try without and will let you inform.

The complete command is

iss generate --seed 110803 --abundance zero_inflated_lognormal --cpus 10  --genomes ../genomes_db/genomes.fna --seed 110803 --abundance zero_inflated_lognormal  --model hiseq --output simulation_1million_1

Genomes.fna contains 114 genomes (human is the largest, only one in Go)

@novitch
Copy link
Author

novitch commented Sep 20, 2019

ok, sit seems to work if i do not use the human genome and do not use the full 120 threads.. 60 threads seems to work instead.

@novitch
Copy link
Author

novitch commented Oct 2, 2019

Hi a little update:
I thought I could deal with the issue by making my random reads of my community and on another side generating reads of the human genome. => But with the human genome only, the problem still persists.
I can't work without human reads, do you think it will be achievable?

@HadrienG
Copy link
Owner

HadrienG commented Oct 3, 2019

I will not have time to fix this issue before mid-October unfortunately.

@novitch
Copy link
Author

novitch commented Oct 3, 2019

Ok, so i'll try to mix Art for human and your soft for the microbes.

Thanks,
Alban.

@HadrienG
Copy link
Owner

HadrienG commented Oct 14, 2019

Hi,

I started working on this.
I could reproduce the bug when generating reads from a fasta file containing all human chromosome concatenated together as one record.

Any reason you are concatenating instead of using --draft to generate accurate number of reads from each record in the reference genome?

EDIT:
I have a fix on the mem branch. You can install from there with

pip install git+https://github.com/HadrienG/InSilicoSeq.git@mem

The fix is currently about 2 times slower than 1.4.x in preliminary tests. It will need to be optimised before I can merge and release an official bugfix.

@novitch
Copy link
Author

novitch commented Oct 15, 2019

Hi,
I was looking for generating reads with abundance values. So If I undestood correctly, I can't use both draft options and abundance file.

Thanks, for your speed, I'll try with the mem branch.

@HadrienG
Copy link
Owner

I can't use both draft options and abundance file.

Correct. This should be addressed within the month for release 1.5.0 (see #83 ).

I'll try with the mem branch

Thanks. Don't hesitate to report any bug you might find 😄

@HadrienG
Copy link
Owner

The fix is implemented in 1.4.4

@novitch
Copy link
Author

novitch commented Oct 23, 2019

Thanks Hadrien,
Great job for your softs and quick releases :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants