You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently Stateful UDFs are initialized once per execution of a UDF, instead of once per worker initialization. This means that we are unable to amortize the cost of expensive initializations over the multiple partitions that a single worker is processing.
Describe the solution you'd like
Workers should be able to identify stateful UDFs in a given window of execution, and only run their initializers once only, reusing them across multiple windows.
Additional context
See code in @udf which hardcodes the initializations of stateful UDFs on a per-UDF call basis:
Add tests and fixes for accounting for init_args and batch_size when running the stateful UDFs
Have actor pool resource requests deduct from the globally available resource pool to avoid weird issues of starving any running tasks (e.g. not enough memory). Instead, the tasks should pre-emptively throw an error during admission to indicate that there will not be enough resources to run the tasks.
As per offline discussion, we can keep the implementation of actor pools locally simple for now by not attempting to do any smart lazy initialization/teardown of these pools. When the PhysicalPlan runs, all the actor pools in the plan will spin up and we make no guarantees about when they are torn down. These pools deduct from the global available resources (of GPUs and memory) so any subsequent tasks will now have a smaller pool of resources they can pick from.
Is your feature request related to a problem? Please describe.
Currently Stateful UDFs are initialized once per execution of a UDF, instead of once per worker initialization. This means that we are unable to amortize the cost of expensive initializations over the multiple partitions that a single worker is processing.
Describe the solution you'd like
Workers should be able to identify stateful UDFs in a given window of execution, and only run their initializers once only, reusing them across multiple windows.
Additional context
See code in
@udf
which hardcodes the initializations of stateful UDFs on a per-UDF call basis:Daft/daft/udf.py
Lines 73 to 79 in 2496baa
The text was updated successfully, but these errors were encountered: