RuntimeError: The task could not be sent to the workers as it is too large for `send_bytes`. #119

novitch · 2019-09-20T01:51:17Z

Hi I would like to use the soft in multi-threading but when i tried I ran into an issue:
Here is the complete stdout:

INFO:iss.app:Starting iss generate
INFO:iss.app:Using kde ErrorModel
INFO:iss.app:Setting random seed to 110803
INFO:iss.util:Stitching input files together
INFO:iss.app:Using zero_inflated_lognormal abundance distribution
INFO:iss.app:Using 10 cpus for read generation
INFO:iss.app:Generating 1000000 reads
INFO:iss.app:Generating reads for record: GCA_000710275.1_ASM71027v1_genomic.fna
INFO:iss.app:Generating reads for record: GCA_001600775.1_JCM_11348_assembly_v001_genomic.fna
INFO:iss.app:Generating reads for record: GCA_001890705.1_Aspsy1_genomic.fna
INFO:iss.app:Generating reads for record: GCA_002551515.1_Malafurf_genomic.fna
INFO:iss.app:Generating reads for record: GCA_002901145.1_ASM290114v1_genomic.fna
INFO:iss.app:Generating reads for record: GCF_000001405.39_GRCh38.p13_genomic.fna
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/matalb01/virtual-envs/iss_36/lib/python3.6/site-packages/joblib/externals/loky/backend/queues.py", line 156, in _feed
    send_bytes(obj_)
  File "/opt/intel/intelpython3/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
 File "/opt/intel/intelpython3/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/matalb01/virtual-envs/iss_36/bin/iss", line 10, in <module>
    sys.exit(main())
  File "/home/matalb01/virtual-envs/iss_36/lib/python3.6/site-packages/iss/app.py", line 510, in main
    args.func(args)
  File "/home/matalb01/virtual-envs/iss_36/lib/python3.6/site-packages/iss/app.py", line 226, in generate_reads
    args.gc_bias) for i in range(cpus))
  File "/home/matalb01/virtual-envs/iss_36/lib/python3.6/site-packages/joblib/parallel.py", line 934, in __call__
    self.retrieve()
  File "/home/matalb01/virtual-envs/iss_36/lib/python3.6/site-packages/joblib/parallel.py", line 833, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/matalb01/virtual-envs/iss_36/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 521, in wrap_future_result
    return future.result(timeout=timeout)
  File "/opt/intel/intelpython3/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
 File "/opt/intel/intelpython3/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
RuntimeError: The task could not be sent to the workers as it is too large for `send_bytes`.

The text was updated successfully, but these errors were encountered:

novitch · 2019-09-20T01:52:07Z

If you have a suggestion I would like to test..
I am working on a linux server with 1To of ram and 120 cpu.. I tried between 10 and 120 cpu and obtain the same error message.
thanks
Alban.

HadrienG · 2019-09-20T06:40:33Z

Hi!

Thanks for reporting this. Could you share with me:

the exact command you used
the size of GCF_000001405.39_GRCh38.p13_genomic.fna?

The error message indicates that some data object is too large to be passed around by the multiprocessing library. I haven't tested InSilicoSeq on large(ish) eukaryotes, so it is possible the human genome is too big for the default data type used by multiprocessing

I'll test on my side, but in the meantime you can try again with removing the human genome from your input dataset.

Best,
Hadrien

novitch · 2019-09-20T12:54:37Z

Hi Hadrien,
The file is 3,1G .. I'll try without and will let you inform.

The complete command is

iss generate --seed 110803 --abundance zero_inflated_lognormal --cpus 10  --genomes ../genomes_db/genomes.fna --seed 110803 --abundance zero_inflated_lognormal  --model hiseq --output simulation_1million_1

Genomes.fna contains 114 genomes (human is the largest, only one in Go)

novitch · 2019-09-20T13:07:57Z

ok, sit seems to work if i do not use the human genome and do not use the full 120 threads.. 60 threads seems to work instead.

novitch · 2019-10-02T15:47:34Z

Hi a little update:
I thought I could deal with the issue by making my random reads of my community and on another side generating reads of the human genome. => But with the human genome only, the problem still persists.
I can't work without human reads, do you think it will be achievable?

HadrienG · 2019-10-03T19:06:31Z

I will not have time to fix this issue before mid-October unfortunately.

novitch · 2019-10-03T19:41:33Z

Ok, so i'll try to mix Art for human and your soft for the microbes.

Thanks,
Alban.

HadrienG · 2019-10-14T09:54:06Z

Hi,

I started working on this.
I could reproduce the bug when generating reads from a fasta file containing all human chromosome concatenated together as one record.

Any reason you are concatenating instead of using --draft to generate accurate number of reads from each record in the reference genome?

EDIT:
I have a fix on the mem branch. You can install from there with

pip install git+https://github.com/HadrienG/InSilicoSeq.git@mem

The fix is currently about 2 times slower than 1.4.x in preliminary tests. It will need to be optimised before I can merge and release an official bugfix.

novitch · 2019-10-15T14:25:47Z

Hi,
I was looking for generating reads with abundance values. So If I undestood correctly, I can't use both draft options and abundance file.

Thanks, for your speed, I'll try with the mem branch.

HadrienG · 2019-10-16T07:11:56Z

I can't use both draft options and abundance file.

Correct. This should be addressed within the month for release 1.5.0 (see #83 ).

I'll try with the mem branch

Thanks. Don't hesitate to report any bug you might find 😄

HadrienG · 2019-10-23T08:57:43Z

The fix is implemented in 1.4.4

novitch · 2019-10-23T12:25:06Z

Thanks Hadrien,
Great job for your softs and quick releases :)

HadrienG added the bug label Sep 20, 2019

HadrienG self-assigned this Sep 20, 2019

HadrienG closed this as completed Oct 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: The task could not be sent to the workers as it is too large for `send_bytes`. #119

RuntimeError: The task could not be sent to the workers as it is too large for `send_bytes`. #119

novitch commented Sep 20, 2019

novitch commented Sep 20, 2019 •

edited

Loading

HadrienG commented Sep 20, 2019

novitch commented Sep 20, 2019

novitch commented Sep 20, 2019 •

edited

Loading

novitch commented Oct 2, 2019

HadrienG commented Oct 3, 2019

novitch commented Oct 3, 2019

HadrienG commented Oct 14, 2019 •

edited

Loading

novitch commented Oct 15, 2019

HadrienG commented Oct 16, 2019

HadrienG commented Oct 23, 2019

novitch commented Oct 23, 2019

RuntimeError: The task could not be sent to the workers as it is too large for send_bytes. #119

RuntimeError: The task could not be sent to the workers as it is too large for send_bytes. #119

Comments

novitch commented Sep 20, 2019

novitch commented Sep 20, 2019 • edited Loading

HadrienG commented Sep 20, 2019

novitch commented Sep 20, 2019

novitch commented Sep 20, 2019 • edited Loading

novitch commented Oct 2, 2019

HadrienG commented Oct 3, 2019

novitch commented Oct 3, 2019

HadrienG commented Oct 14, 2019 • edited Loading

novitch commented Oct 15, 2019

HadrienG commented Oct 16, 2019

HadrienG commented Oct 23, 2019

novitch commented Oct 23, 2019

RuntimeError: The task could not be sent to the workers as it is too large for `send_bytes`. #119

RuntimeError: The task could not be sent to the workers as it is too large for `send_bytes`. #119

novitch commented Sep 20, 2019 •

edited

Loading

novitch commented Sep 20, 2019 •

edited

Loading

HadrienG commented Oct 14, 2019 •

edited

Loading