Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] [CC Refactor #1] Add ConfigurationProvider, TableIdentifier, TableBuilder APIs #3781

33 changes: 17 additions & 16 deletions kernel/kernel-api/src/main/java/io/delta/kernel/Table.java
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
import io.delta.kernel.exceptions.CheckpointAlreadyExistsException;
import io.delta.kernel.exceptions.KernelException;
import io.delta.kernel.exceptions.TableNotFoundException;
import io.delta.kernel.internal.TableImpl;
import io.delta.kernel.internal.TableBuilderImpl;
import java.io.IOException;

/**
Expand All @@ -33,28 +33,29 @@ public interface Table {
/**
* Instantiate a table object for the Delta Lake table at the given path.
*
* <ul>
* <li>Behavior when the table location doesn't exist:
* <ul>
* <li>Reads will fail with a {@link TableNotFoundException}
* <li>Writes will create the location
* </ul>
* <li>Behavior when the table location exists (with contents or not) but not a Delta table:
* <ul>
* <li>Reads will fail with a {@link TableNotFoundException}
* <li>Writes will create a Delta table at the given location. If there are any existing
* files in the location that are not already part of the Delta table, they will
* remain excluded from the Delta table.
* </ul>
* </ul>
* <p>For detailed behavior when creating, reading from, or writing to the table, see {@link
* TableBuilder#build()}.
*
* @param engine {@link Engine} instance to use in Delta Kernel.
* @param path location of the table. Path is resolved to fully qualified path using the given
* {@code engine}.
* @return an instance of {@link Table} representing the Delta table at the given path
*/
static Table forPath(Engine engine, String path) {
return TableImpl.forPath(engine, path);
return builder(engine, path).build();
}

/**
* Create a {@link TableBuilder} to build a {@link Table} instance.
*
* @param engine {@link Engine} instance to use in Delta Kernel.
* @param path location of the table. Path is resolved to fully qualified path using the given
* {@code engine}.
* @return a {@link TableBuilder} instance
* @since 3.3.0
*/
static TableBuilder builder(Engine engine, String path) {
return new TableBuilderImpl(engine, path);
}

/**
Expand Down
76 changes: 76 additions & 0 deletions kernel/kernel-api/src/main/java/io/delta/kernel/TableBuilder.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
/*
* Copyright (2024) The Delta Lake Project Authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package io.delta.kernel;

import io.delta.kernel.annotation.Evolving;
import io.delta.kernel.config.ConfigurationProvider;
import io.delta.kernel.engine.Engine;
import io.delta.kernel.exceptions.TableNotFoundException;

/**
* Builder for creating a {@link Table} instance.
*
* <p>Required parameters for building a {@link TableBuilder} instance include:
*
* <ul>
* <li>{@code Engine}: The {@link Engine} instance required to interact with Delta Kernel.
* <li>{@code Path}: The location of the table, resolved by the provided {@code Engine}.
* </ul>
*
* @since 3.3.0
*/
@Evolving
public interface TableBuilder {

/**
* Set the {@link ConfigurationProvider} to use for the table.
*
* @param configProvider {@link ConfigurationProvider} to use for the table
* @return this updated {@link TableBuilder} instance
*/
TableBuilder withConfigurationProvider(ConfigurationProvider configProvider);

/**
* Set the {@link TableIdentifier} for the table.
*
* @param tableId {@link TableIdentifier} for the table
* @return this updated {@link TableBuilder} instance
*/
TableBuilder withTableId(TableIdentifier tableId);

/**
* Builds the table object for the Delta Lake table at the given path.
*
* <ul>
* <li>Behavior when the table location doesn't exist:
* <ul>
* <li>Reads will fail with a {@link TableNotFoundException}
* <li>Writes will create the location
* </ul>
* <li>Behavior when the table location exists (with contents or not) but is not a Delta table:
* <ul>
* <li>Reads will fail with a {@link TableNotFoundException}
* <li>Writes will create a Delta table at the given location. If there are any existing
* files in the location that are not already part of the Delta table, they will
* remain excluded from the Delta table.
* </ul>
* </ul>
*
* @return the new {@link Table} instance
*/
Table build();
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/*
* Copyright (2024) The Delta Lake Project Authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package io.delta.kernel;

/** Identifier for a table. e.g. <catalog> / <schema> / <tableName> */
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@since 3.3.0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Evolving

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[error] /home/runner/work/delta/delta/kernel/kernel-api/src/main/java/io/delta/kernel/TableIdentifier.java:19:1:  error: unknown tag: catalog
[error] /** Identifier for a table. e.g. <catalog> / <schema> / <tableName> */

public class TableIdentifier {
/** The name of the table. */
private final String name;

/** The namespace of the table. e.g. <catalog> / <schema> */
private final String[] namespace;
Comment on lines +21 to +25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The order of constructor arguments & getters is namespacename. I would recommend using the same order in fields.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will do! I'm still working on some more PRs to fully flesh out this design. Will get to this later.


public TableIdentifier(String[] namespace, String name) {
this.namespace = namespace;
this.name = name;
Comment on lines +28 to +29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requireNonNull seems missing. Also, it would be nice to ensure non-empty in my opinion.

}

/** Returns the namespace of the table. */
public String[] getNamespace() {
return namespace;
}

/** Returns the name of the table. */
public String getName() {
return name;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toString (for printing into logs). equals, hashCode (may add if needed later, but I do see this is useful in near future).

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
/*
* Copyright (2024) The Delta Lake Project Authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package io.delta.kernel.config;

import io.delta.kernel.annotation.Evolving;
import java.util.NoSuchElementException;
import java.util.Optional;

/**
* Provides per-session configuration values for Delta Kernel, specific to the current execution.
*
* <p>These values are not Delta table properties. Delta table properties are stored in the Delta
* log metadata and apply globally to all readers and writers interacting with the table.
Comment on lines +26 to +27
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give an example of where this gets used or what sort of configuration can it contain?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First example usage

The implementation of

  CommitCoordinatorClient Engine::getCommitCoordinatorClient(
      String commitCoordinatorName,
      ConfigurationProvider sessionConfig,
      Map<String, String> commitCoordinatorConf);

will use it to lookup which builder to use for the given commitCoordinatorName.

It will lookup a value like io.delta.kernel.commitCoordinatorBuilder.unity-catalog.impl

Second example usage

Then, the actual builder referenced above will lookup various keys / configurations needed to instantiate the client.

*
* <p>Instead, the configuration values provided by this interface are scoped to a single instance
* of Delta Kernel during its execution.
*
* @since 3.3.0
*/
@Evolving
public interface ConfigurationProvider {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the lifecycle of this object? Can Kernel maintain a reference to it? expected to release it after some event? can Engine update it while Kernel is using it (i.e return different values at different times for the same key)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UC Commit Coordinator Client needs to be able to iterate through all configs

/**
* Retrieves the configuration value for the given key.
*
* @param key the configuration key
* @return the configuration value associated with the key
* @throws NoSuchElementException if the key is not found
*/
String get(String key);

/**
* Retrieves the configuration value for the given key. If it doesn't exist, returns {@link
* Optional#empty}.
*
* @param key the configuration key
* @return the configuration value associated with the key, if it exists
*/
Optional<String> getOptional(String key);

/**
* Checks if the configuration provider contains the given key.
*
* @param key the configuration key
* @return {@code true} if the key exists, else {@code false}
*/
boolean contains(String key);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contains is tricky to define for conf. There is always some default conf and what does it mean when call contains.

Is it better to just have Optional<String> get(String key) and avoid the multiple APIs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Thanks!

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
/*
* Copyright (2024) The Delta Lake Project Authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package io.delta.kernel.internal;

import io.delta.kernel.config.ConfigurationProvider;
import java.util.NoSuchElementException;
import java.util.Optional;

/**
* An implementation of {@link ConfigurationProvider} that always returns empty results or throws
* exceptions for configuration values. This class is intended to be used in cases where no
* configuration is provided or expected.
*/
public class EmptyConfigurationProvider implements ConfigurationProvider {
@Override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can define just one static instance and use it everywhere rather than creating a new instance everytime.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that could be even in ConfigurationProvider (avoids a new file).

public String get(String key) {
throw new NoSuchElementException();
}

@Override
public Optional<String> getOptional(String key) {
return Optional.empty();
}

@Override
public boolean contains(String key) {
return false;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
/*
* Copyright (2024) The Delta Lake Project Authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package io.delta.kernel.internal;

import static java.util.Objects.requireNonNull;

import io.delta.kernel.Table;
import io.delta.kernel.TableBuilder;
import io.delta.kernel.TableIdentifier;
import io.delta.kernel.config.ConfigurationProvider;
import io.delta.kernel.engine.Engine;
import io.delta.kernel.internal.util.Clock;
import java.util.Optional;

public class TableBuilderImpl implements TableBuilder {
private final Engine engine;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we avoid this kind of pattern where Engine is stored in class member variables.

Copy link
Collaborator Author

@scottsand-db scottsand-db Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. We will have to think about how to work around this, then

// Table.java
static TableBuilder builder() { return new TableBuilderImpl(); }

// TableBuilder.java
public Table build(Engine engine, String path)

private final String path;

private Optional<ConfigurationProvider> configProviderOpt = Optional.empty();
private Optional<TableIdentifier> tableIdOpt = Optional.empty();

/** Note: This is not for a public API. It's for internal testing use only. */
private Optional<Clock> clockOpt = Optional.empty();
Comment on lines +33 to +37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to add opt suffix in my opinion. It's obvious from the type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear that. This is certainly one of my personal preferences. I like variables being fully self-describing. The value isn't a Clock, but distinctly a Clock Optional. With this current naming convention, if we did a .map, I could refer to the inner value as clock, but if I had the outer variable also named clock, this would get confusing.


public TableBuilderImpl(Engine engine, String path) {
requireNonNull(engine, "Expected non-null value for 'engine'");
requireNonNull(path, "Expected non-null value for 'path'");
Comment on lines +40 to +41
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing message pattern of requireNonNull is {variable} is null. I would recommend using the same pattern for consistency.


this.engine = engine;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.engine = engine;
this.engine = requireNonNull(engine, "blah blah");

this.path = path;
}

@Override
public TableBuilder withConfigurationProvider(ConfigurationProvider configProvider) {
requireNonNull(configProvider, "Expected non-null value for 'configProvider'");
this.configProviderOpt = Optional.of(configProvider);
return this;
}

@Override
public TableBuilder withTableId(TableIdentifier tableId) {
requireNonNull(tableId, "Expected non-null value for 'tableId'");
this.tableIdOpt = Optional.of(tableId);
return null;
}

/** Note: This is not a public API. It's for internal testing use only. */
public TableBuilder withClock(Clock clock) {
requireNonNull(clock, "Expected non-null value for 'clock'");
this.clockOpt = Optional.of(clock);
return this;
}

@Override
public Table build() {
return TableImpl.create(
engine,
path,
clockOpt.orElse(System::currentTimeMillis),
configProviderOpt.orElse(new EmptyConfigurationProvider()),
tableIdOpt);
}
}
Loading
Loading