[attempt 2] Eager import pandas in ray data for python >= 3.7 #33103

ericl · 2023-03-07T06:57:26Z

Why are these changes needed?

The original fix was reverted: #32533

Closes #32435

Signed-off-by: Eric Liang <[email protected]>

jianoaix · 2023-03-07T16:51:34Z

There seems a new failure of python/ray/data/tests/test_dataset_image.py.

ericl · 2023-03-07T19:13:10Z

It seems to be broken in master for me.

c21 · 2023-03-07T19:16:06Z

Weird, I didn't see the failure on https://flakey-tests.ray.io/, or on my split PR - #33101 . Could we rebase to latest master to be sure?

jianoaix · 2023-03-07T19:18:14Z

Yeah the one with python/ray/data/tests/test_dataset_image.py looks new:

ray_start_regular_shared = <python.ray.data.tests.test_dataset_image.TestReadImages object at 0x7fc69d1dca90>

    def test_dynamic_block_split(ray_start_regular_shared):
        ctx = ray.data.context.DatasetContext.get_current()
        target_max_block_size = ctx.target_max_block_size
        block_splitting_enabled = ctx.block_splitting_enabled
        # Reduce target max block size to trigger block splitting on small input.
        # Otherwise we have to generate big input files, which is unnecessary.
        ctx.target_max_block_size = 1
        ctx.block_splitting_enabled = True
        try:
            root = "example://image-datasets/simple"
            ds = ray.data.read_images(root, parallelism=1)
            assert ds.num_blocks() == 1
            ds.fully_executed()
            # Verify dynamic block splitting taking effect to generate more blocks.
            assert ds.num_blocks() == 3
    
            # NOTE: Need to wait for 1 second before checking stats, because we report
            # stats to stats actors asynchronously when returning the blocks metadata.
            # TODO(chengsu): clean it up after refactoring lazy block list.
            time.sleep(1)
>           assert "3 blocks executed" in ds.stats()
E           AssertionError: assert '3 blocks executed' in 'Stage 0 read: 1/1 blocks split from parent in 0s\n* Output num rows: 3 min, 3 max, 3 mean, 3 total\n* Output size bytes: 2724 min, 2724 max, 2724 mean, 2724 total\n'
E            +  where 'Stage 0 read: 1/1 blocks split from parent in 0s\n* Output num rows: 3 min, 3 max, 3 mean, 3 total\n* Output size bytes: 2724 min, 2724 max, 2724 mean, 2724 total\n' = <bound method Dataset.stats of Dataset(num_blocks=3, num_rows=3, schema={image: ArrowTensorType(shape=(32, 32, 3), dtype=uint8)})>()
E            +    where <bound method Dataset.stats of Dataset(num_blocks=3, num_rows=3, schema={image: ArrowTensorType(shape=(32, 32, 3), dtype=uint8)})> = Dataset(num_blocks=3, num_rows=3, schema={image: ArrowTensorType(shape=(32, 32, 3), dtype=uint8)}).stats

python/ray/data/tests/test_dataset_image.py:201: AssertionError

c21 · 2023-03-07T19:20:36Z

ah I guess in this case, the eager import causes longer execution time for test, so sleep for 1 second is not enough:

# NOTE: Need to wait for 1 second before checking stats, because we report
# stats to stats actors asynchronously when returning the blocks metadata.
# TODO(chengsu): clean it up after refactoring lazy block list.
time.sleep(1)

@ericl - could you try increasing the sleep time, e.g. 5 seconds?

Signed-off-by: Eric Liang <[email protected]>

ericl · 2023-03-07T19:41:12Z

I changed it to a wait for condition, but I think the failure is not related. I can reliably reproduce on master

c21 · 2023-03-07T20:29:49Z

I changed it to a wait for condition, but I think the failure is not related. I can reliably reproduce on master

I guess one possible reason might be due to different laptop environment, so some of us can reproduce locally but not all.

…oject#33103) Signed-off-by: Jack He <[email protected]>

…oject#33103) Signed-off-by: Edward Oakes <[email protected]>

…oject#33103)

…oject#33103) Signed-off-by: elliottower <[email protected]>

…oject#33103) Signed-off-by: Jack He <[email protected]>

restore

ec6dd33

Signed-off-by: Eric Liang <[email protected]>

ericl assigned clarkzinzow Mar 7, 2023

ericl requested review from scv119, clarkzinzow, jjyao and jianoaix as code owners March 7, 2023 06:57

ericl assigned jianoaix Mar 7, 2023

ericl requested a review from c21 as a code owner March 7, 2023 06:57

Merge remote-tracking branch 'upstream/master' into pandas-import-revert

31a6ef6

ericl added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Mar 7, 2023

c21 approved these changes Mar 7, 2023

View reviewed changes

use wait for condition

49183cc

Signed-off-by: Eric Liang <[email protected]>

ericl merged commit 1ba4729 into ray-project:master Mar 7, 2023

ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request Mar 21, 2023

[attempt 2] Eager import pandas in ray data for python >= 3.7 (ray-pr…

e11302e

…oject#33103) Signed-off-by: Jack He <[email protected]>

edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023

[attempt 2] Eager import pandas in ray data for python >= 3.7 (ray-pr…

81608f6

…oject#33103) Signed-off-by: Edward Oakes <[email protected]>

peytondmurray pushed a commit to peytondmurray/ray that referenced this pull request Mar 22, 2023

[attempt 2] Eager import pandas in ray data for python >= 3.7 (ray-pr…

5e404a6

…oject#33103)

elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023

[attempt 2] Eager import pandas in ray data for python >= 3.7 (ray-pr…

27c7d3b

…oject#33103) Signed-off-by: elliottower <[email protected]>

ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023

[attempt 2] Eager import pandas in ray data for python >= 3.7 (ray-pr…

956aad1

…oject#33103) Signed-off-by: Jack He <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[attempt 2] Eager import pandas in ray data for python >= 3.7 #33103

[attempt 2] Eager import pandas in ray data for python >= 3.7 #33103

ericl commented Mar 7, 2023

jianoaix commented Mar 7, 2023

ericl commented Mar 7, 2023

c21 commented Mar 7, 2023

jianoaix commented Mar 7, 2023

c21 commented Mar 7, 2023

ericl commented Mar 7, 2023

c21 commented Mar 7, 2023

[attempt 2] Eager import pandas in ray data for python >= 3.7 #33103

[attempt 2] Eager import pandas in ray data for python >= 3.7 #33103

Conversation

ericl commented Mar 7, 2023

Why are these changes needed?

jianoaix commented Mar 7, 2023

ericl commented Mar 7, 2023

c21 commented Mar 7, 2023

jianoaix commented Mar 7, 2023

c21 commented Mar 7, 2023

ericl commented Mar 7, 2023

c21 commented Mar 7, 2023