Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] Refactor planner.py #45706

Merged
merged 6 commits into from
Jun 4, 2024
Merged

Conversation

raulchen
Copy link
Contributor

@raulchen raulchen commented Jun 4, 2024

Why are these changes needed?

planner.py has too many if-else branches. refactor code to make the code cleaner and easier to extend in the future.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Hao Chen <[email protected]>
Signed-off-by: Hao Chen <[email protected]>
Signed-off-by: Hao Chen <[email protected]>
Signed-off-by: Hao Chen <[email protected]>
Signed-off-by: Hao Chen <[email protected]>
Comment on lines +52 to +61
def plan_input_data_op(
logical_op: InputData, physical_children: List[PhysicalOperator]
) -> PhysicalOperator:
"""Get the corresponding DAG of physical operators for InputData."""
assert len(physical_children) == 0

return InputDataBuffer(
input_data=logical_op.input_data,
input_data_factory=logical_op.input_data_factory,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: IMO this'll be easier to read if we consistently define plan functions in separate modules. I think mixing the two approaches (function definitions in separate modules and in this function) hurts readability.

Not heavily opinionated on this, so am okay to keep as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some plan functions are just one line. I feel it's simpler and cleaner to just put them here. For those complex plan functions, it's better to put them in separate files.

PLAN_LOGICAL_OP_FNS.append((logical_op_type, plan_fn))


def register_plan_logical_op_fns():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe something like this to disambiguate from register_plan_logical_op_fn?

Suggested change
def register_plan_logical_op_fns():
def register_default_plan_logical_op_fns():

From the name, I'd expect something like this:

register_plan_logical_op_fns(plan_fns: List[PlanLogicalOpFn])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

from ray.data._internal.planner.plan_write_op import plan_write_op
from ray.util.annotations import DeveloperAPI

LogicalOperatorType = TypeVar("LogicalOperatorType", bound=LogicalOperator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What're the advantages of using this over Type[LogicalOperator]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function argument should be a subclass of LogicalOperator. If using Type[LogicalOperator], it requires exactly LogicalOperator.

Signed-off-by: Hao Chen <[email protected]>
@raulchen raulchen enabled auto-merge (squash) June 4, 2024 19:09
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Jun 4, 2024
@raulchen raulchen merged commit a7345ab into ray-project:master Jun 4, 2024
7 checks passed
@raulchen raulchen deleted the refactor-planner branch June 5, 2024 02:32
richardsliu pushed a commit to richardsliu/ray that referenced this pull request Jun 12, 2024
`planner.py` has too many if-else branches. refactor code to make the
code cleaner and easier to extend in the future.

---------

Signed-off-by: Hao Chen <[email protected]>
Signed-off-by: Richard Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants