Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48260][SQL] Disable output committer coordination in one test of ParquetIOSuite #46562

Closed
wants to merge 2 commits into from

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

A test from ParquetIOSuite is flaky: SPARK-7837 Do not close output writer twice when commitTask() fails

It turns out to be a race condition. The test injects error to the task committing step, and the job may fail in two ways:

  1. The task got the driver's permission to commit the task, but the committing failed and thus the task failed. This will trigger a stage failure as it means possible data duplication, see [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status #36564
  2. In test we disable task retry, so TaskSetManager will abort the stage.

Both these two failures are done by sending an event to DAGScheduler, so the final job failure depends on which event gets processed first. This is not a big deal, but that test in ParquetIOSuite checks the error class. This PR fixes the flaky test by running the test case in a new test suite with output committer coordination disabled

Why are the changes needed?

fix flaky test

Does this PR introduce any user-facing change?

no

How was this patch tested?

GA test + manual test on lcoal

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label May 13, 2024
@gengliangwang
Copy link
Member Author

This is another approach for #46560. I am trying to avoid changing the production code for a flaky test fix here.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a better approach.

@gengliangwang
Copy link
Member Author

@viirya @cloud-fan thanks for the review. Merging to master.

cloud-fan pushed a commit that referenced this pull request May 14, 2024
### What changes were proposed in this pull request?
The pr aims to remove workaround for ParquetIOSuite.

### Why are the changes needed?
After #46562 is completed, the reason why the ut `SPARK-7837 Do not close output writer twice when commitTask() fails` failed due to different event processing time sequence no longer exists, so we remove the previous workaround here.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46577 from panbingkun/SPARK-47301_FOLLOWUP.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@dongjoon-hyun
Copy link
Member

Hi, @gengliangwang and all.

Do you think we can have this at release branches too?

@gengliangwang
Copy link
Member Author

@dongjoon-hyun From the recent tests of branch-3.5(https://github.com/apache/spark/commits/branch-3.5/), the ParquetIOSuite looks fine for now. I am ok with either backporting or not backporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants