Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] [CC Refactor #3] Add ConfigurationProvider and AbstractCommitCoordinatorBuilder APIs #3798

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

scottsand-db
Copy link
Collaborator

@scottsand-db scottsand-db commented Oct 23, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Adds new ConfigurationProvider and AbstractCommitCoordinatorBuilder APIs

How was this patch tested?

Tests are added in CC Refactor PR 5: #3821

Does this PR introduce any user-facing changes?

Yes see the above

@@ -0,0 +1,85 @@
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have CoordinatedCommitUtils.java

Would be nice to either merge them in the same file eventually or have better names for them

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, everything Coordinated-Commits-related in delta-storage will be refactored and moved to delta-kernel, and then be deleted.

@scottsand-db scottsand-db changed the title [Kernel] [CC Refactor #3] Add ConfigurationProvider, CommitCoordinatorBuilder, CommitCoordinatorUtils public APIs [Kernel] [CC Refactor #3] Add ConfigurationProvider and AbstractCommitCoordinatorBuilder Nov 1, 2024
@scottsand-db scottsand-db changed the title [Kernel] [CC Refactor #3] Add ConfigurationProvider and AbstractCommitCoordinatorBuilder [Kernel] [CC Refactor #3] Add ConfigurationProvider and AbstractCommitCoordinatorBuilder APIs Nov 1, 2024
*
* @param commitCoordinatorName the commit coordinator name
* @param sessionConfig The session-level configuration that may represent environment-specific
* configurations such as {@code HadoopConf} or {@code SparkConf}. This sessionConfig is used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make them implement ConfigurationProvider or how can we, e.g., pass a SparkConf as a ConfigurationProvider?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need to pass in something that is a ConfigurationProvider (i.e. that implements the interface). You can easily create a wrapper around a SparkConf or a HadoopConf to implemeent the ConfigurationProvider interface

@scottsand-db
Copy link
Collaborator Author

@LukasRupprecht and @sumeet-db -- is it every possible or intended/desired that AbstractMetadata and AbstractProtocol would be null? or are these required params? I see null being passed in in some existing tests today ...

*
* @param commitCoordinatorName the commit coordinator name
* @param sessionConfig The session-level configuration that may represent environment-specific
* configurations such as {@code HadoopConf} or {@code SparkConf}. This sessionConfig is used
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you need both hadoopConf and sparkConf?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. You can union them to implement the get interface.

When we go to implement the UC CCC using this API, we can adjust methods as needed.

* configuration properties for describing the Delta table to the commit-coordinator.
* @return the {@link CommitCoordinatorClient} corresponding to the given commit coordinator name
*/
public static CommitCoordinatorClient buildCommitCoordinatorClient(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the dev experience for using this method will be:

CommitCoordinatorClient client = AbstractCommitCoordinatorBuilder
                .buildCommitCoordinatorClient(
                    "dynamodb",
                    sessionConfig,
                    coordinatorConf
                );

Somehow, it doesn't feel intuitive that this code will look up the builder in the conf and do stuff

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's this method different than CommitCoordinatorProvider.getBuilder(... /* load builder dynamically from conf */).build(...)? And why do you prefer this one? 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow, it doesn't feel intuitive that this code will look up the builder in the conf and do stuff

Really? I quite like it. Feel free to let me know what doesn't feel right about this. This is what we agreed on in the sync on Tuesday Oct 22 @ 1:00pm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further, this API feels very similar to the previous API of

CommitCoordinatorClient client = CommitCoordinatorProvider
  .getCommitCoordinatorClient(
    hadoopConf,
    name,
    commitCoordinatorConf);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping we could prevent patterns like the following by moving buildCommitCoordinatorClient to a separate DefaultCommitCoordinatorClientBuilder that DynamoDBCCClientBuilder doesn't need to extend:

CommitCoordinatorClient client = DynamoDBCCClientBuilder.buildCommitCoordinatorClient(
    "dynamodb", 
    config, 
    coordConf);

* @since 3.3.0
*/
@Evolving
public interface ConfigurationProvider {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottsand-db This class doesn't provide any method to iterate all configs. Does that work for UC Commit Coordinator? Don't we need to iterate over all configs in that? Will let @sumeet-db confirm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prakharjain09 -- Yup, great callout. We can update APIs when we get there. IMO there's no rush or requirement to add it now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need a getAll / getAllByPrefix API for UC Commit Coordinator Builder when we get there.

@LukasRupprecht
Copy link
Contributor

@LukasRupprecht and @sumeet-db -- is it every possible or intended/desired that AbstractMetadata and AbstractProtocol would be null? or are these required params? I see null being passed in in some existing tests today ...

They shouldn't be null as client implementations may rely on it (test code is an exception if we can be sure that the underlying client doesn't require it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants