Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48370][CONNECT][FOLLOW-UP] Use JDK's Cleaner instead #46726

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR is a followup of #46683 that replaces our custom cleaner to JDK's cleaner.

Why are the changes needed?

Reuse the standard builtin library.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I manually tested via reenabling CheckpointSuite.checkpoint gc derived DataFrame

Was this patch authored or co-authored using generative AI tooling?

No.

}
} catch {
case e: Throwable => logError("Error in cleaning thread", e)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is swollen but I think it's better to explicitly log

@HyukjinKwon
Copy link
Member Author

cc @hvanhovell

}
cleaningThread.join()
}
private val cleaner = Cleaner.create()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the lifecycle of SessionCleaner is the same as SparkSession, so when the client holds multiple SparkSessions, multiple instances of java.lang.ref.Cleaner will be created. If cleaner is defined in the companion object of SessionCleaner, it can allow multiple SessionCleaner to share one java.lang.ref.Cleaner instance . Can this meet the requirements?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have one and share between other sessions but wanted to scope the cleaning specific to a session so it doesn't affect other sessions. I am fine either way though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this in a follow-up? @LuciferYang is there any concrete concern here? Or are you just being tidy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine, we can make follow-up when this really becomes an issue :)

@HyukjinKwon
Copy link
Member Author

Merged to master.

riyaverm-db pushed a commit to riyaverm-db/spark that referenced this pull request Jun 7, 2024
### What changes were proposed in this pull request?

This PR is a followup of apache#46683 that replaces our custom cleaner to JDK's cleaner.

### Why are the changes needed?

Reuse the standard builtin library.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I manually tested via reenabling `CheckpointSuite.checkpoint gc derived DataFrame`

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46726 from HyukjinKwon/SPARK-48370-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants