Don't offload serialization in scheduler #3776

mrocklin · 2020-05-05T14:46:27Z

Currently in TCP and UCX comms we offload serialization for large messages

Lines 72 to 75 in d0f6aec

    
           if deserialize and FRAME_OFFLOAD_THRESHOLD and size > FRAME_OFFLOAD_THRESHOLD: 
        
               res = await offload(_from_frames) 
        
           else: 
        
               res = _from_frames()

The sizeof computation can be a little expensive, particularly when we run it on every message. This ends up taking around 10% of our time under some benchmarks

In the case of workers, this is probably fine (and maybe a good idea). However for the scheduler this is probably unnecessary. The scheduler tends to only store pre-serialized data, so the serialization process is just unpacking some Python objects and won't need to be done in a separate thread.

It would be good to skip offloading in the scheduler, but keep it in the workers.

Probably the place to specify this is in the Scheduler's ConnectionPool. However we'll want to be careful because not every Comm serializes and offloads. This maybe requires some sort of kwargs option? I'm not sure.

The text was updated successfully, but these errors were encountered:

mrocklin · 2020-05-05T16:08:19Z

cc @kkraus14 @quasiben this may interest you and your team, especially since some of them already have some familiarity with Dask comms

martindurant mentioned this issue May 12, 2020

control de/ser offload #3793

Merged

martindurant self-assigned this May 12, 2020

martindurant closed this as completed in #3793 May 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't offload serialization in scheduler #3776

Don't offload serialization in scheduler #3776

mrocklin commented May 5, 2020

mrocklin commented May 5, 2020

Don't offload serialization in scheduler #3776

Don't offload serialization in scheduler #3776

Comments

mrocklin commented May 5, 2020

mrocklin commented May 5, 2020