Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform Stateful UDF initialization once-per-worker instead of once-per-partition #196

Closed
jaychia opened this issue Sep 16, 2022 · 3 comments
Assignees
Labels
p0 Priority 0 - to be addressed immediately

Comments

@jaychia
Copy link
Contributor

jaychia commented Sep 16, 2022

Is your feature request related to a problem? Please describe.
Currently Stateful UDFs are initialized once per execution of a UDF, instead of once per worker initialization. This means that we are unable to amortize the cost of expensive initializations over the multiple partitions that a single worker is processing.

Describe the solution you'd like
Workers should be able to identify stateful UDFs in a given window of execution, and only run their initializers once only, reusing them across multiple windows.

Additional context
See code in @udf which hardcodes the initializations of stateful UDFs on a per-UDF call basis:

Daft/daft/udf.py

Lines 73 to 79 in 2496baa

# TODO: The initialization of stateful UDFs is currently done on the execution on every partition here,
# but should instead be done on a higher level so that state initialization cost can be amortized across partitions.
try:
initialized_func = func() if isinstance(func, type) else func
except:
logger.error(f"Encountered error when initializing user-defined function {func.__name__}")
raise

@jaychia
Copy link
Contributor Author

jaychia commented Feb 9, 2024

Closing as inactive

@jaychia jaychia closed this as not planned Won't fix, can't repro, duplicate, stale Feb 9, 2024
@jaychia jaychia reopened this Jul 18, 2024
@jaychia jaychia added p0 Priority 0 - to be addressed immediately and removed inactive labels Jul 18, 2024
@jaychia jaychia self-assigned this Jul 18, 2024
@jaychia
Copy link
Contributor Author

jaychia commented Sep 2, 2024

As of #2677, there are a few remaining todos:

  • Add tests and fixes for accounting for init_args and batch_size when running the stateful UDFs
  • Have actor pool resource requests deduct from the globally available resource pool to avoid weird issues of starving any running tasks (e.g. not enough memory). Instead, the tasks should pre-emptively throw an error during admission to indicate that there will not be enough resources to run the tasks.

As per offline discussion, we can keep the implementation of actor pools locally simple for now by not attempting to do any smart lazy initialization/teardown of these pools. When the PhysicalPlan runs, all the actor pools in the plan will spin up and we make no guarantees about when they are torn down. These pools deduct from the global available resources (of GPUs and memory) so any subsequent tasks will now have a smaller pool of resources they can pick from.

cc @kevinzwang

@jaychia
Copy link
Contributor Author

jaychia commented Oct 7, 2024

Closing this as it is done, but needs to make it through last bits of dogfooding/testing from @kevinzwang to push past the finish line.

@jaychia jaychia closed this as completed Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p0 Priority 0 - to be addressed immediately
Projects
None yet
Development

No branches or pull requests

2 participants