Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add daft.context.set_config function for setting configurations globally #1700

Merged
merged 3 commits into from
Dec 5, 2023

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Dec 5, 2023

  1. Adds a daft.context.set_config(**kwargs) method for users to set configuration parameters in Daft
  2. Adds a new common/daft-config crate to house the DaftConfig struct
  3. Thread PyDaftConfig {config: Arc<DaftConfig>} through the pyo3 layer for any functions that require access to configuration values: use daft.context.get_context().daft_config to retrieve the current PyDaftConfig from Python

.set_config() in action:
image

NOTE: Because we leverage our current daft.context Python global singleton machinery, note that the configs are probably not being propagated correctly in a multiprocessing/Ray worker environment to separate Python processes.

For this to work, we will need to correctly initialize any Ray workers with the current DaftContext, or abandon the DaftContext machinery for something more suitable.

@jaychia jaychia changed the title [FEAT] Add daft.context.set_config function [FEAT] Add daft.context.set_config function for setting configurations globally Dec 5, 2023
@github-actions github-actions bot added the enhancement New feature or request label Dec 5, 2023
Copy link
Member

@samster25 samster25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jaychia jaychia merged commit 52cbaba into jay/scan-task-iter Dec 5, 2023
2 checks passed
@jaychia jaychia deleted the jay/daft-config branch December 5, 2023 20:50
jaychia added a commit that referenced this pull request Dec 5, 2023
…izes (#1692)

Refactors/changes required on ScanTask itself:

1. Added a `ScanTask::merge`
2. Added a `ScanTask::partition_spec()`
3. Added some validation in `ScanTask::new` to assert that all the
underlying sources have the same partition spec

I then added a new module `daft_scan::scan_task_iterators` which
contains functions that perform transformations on a `Box<dyn
Iterator<item = DaftResult<ScanTaskRef>>>`.

TODO:

- [x] Make the file_size configurable (as an environment
variable/context flag) so that our unit-tests still run correctly when
we do multi-file tests for multi-partition dataframes (see: #1700 )

---------

Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants