Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Move UpdateRequirement out of rest package #7750

Merged
merged 2 commits into from
Jun 16, 2023

Conversation

nastra
Copy link
Contributor

@nastra nastra commented Jun 1, 2023

This currently depends on #7741.

The goal of this PR is to make UpdateRequirement available outside of the rest package layer, so that we can use it for #7569.

The following things have been done in this PR:

  • create org.apache.iceberg.UpdateRequirement (copy of org.apache.iceberg.rest.requests.UpdateTableRequest.UpdateRequirement) and org.apache.iceberg.UpdateRequirementParser (copy of org.apache.iceberg.rest.requests.UpdateRequirementParser)
  • deprecate org.apache.iceberg.rest.requests.UpdateTableRequest.UpdateRequirement and org.apache.iceberg.rest.requests.UpdateTableRequest
  • replace org.apache.iceberg.rest.requests.UpdateTableRequest with org.apache.iceberg.rest.requests.CommitTableRequest which has the exact same structure, but takes org.apache.iceberg.UpdateRequirement
  • switch existing usage of org.apache.iceberg.rest.requests.UpdateTableRequest.UpdateRequirement to org.apache.iceberg.UpdateRequirement
  • switch existing usage of org.apache.iceberg.rest.requests.UpdateTableRequest to org.apache.iceberg.CommitTableRequest

}
}

public List<UpdateRequirement> build() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the builder makes much sense when these classes are public. The builder was originally creating the UpdateTableRequest, which needed a list of both changes and requirements. Now, this is only producing a list of requirements from a list of changes so the builder pattern is unnecessary.

I think the cleanest path forward is to try to match the MetadataUpdate API here. Let's keep the constructors for these classes public like MetadataUpdate classes and add logic for converting a collection of updates into a collection of requirements elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made all the constructors public in UpdateRequirement and added UpdateRequirements with static methods that would then produce a List<UpdateRequirement from a List<MetadataUpdate>.
We don't expose a builder anymore in the API, but UpdateRequirements still uses one internally, since it needs to maintain some state when producing requirements from a given list of updates.

import org.immutables.value.Value;

@Value.Immutable
public interface CommitTableRequest extends RESTRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we need a new CommitTableRequest or if it would be fine to continue using UpdateTableRequest. The breaking change is going to be removing the List<UpdateTableRequest.UpdateRequirement> requirements() method, but the UpdateRequirement interface is also deprecated and will be removed.

Since source compatibility is already going to break, I don't see a problem with moving directly from List<UpdateTableRequest.UpdateRequirement> requirements() to List<UpdateRequirement> requirements(). The requirements that are returned will implement both interfaces until the older UpdateTableRequest.UpdateRequirement is removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we 're ok breaking source compability, then we can update UpdateTableRequest directly and don't need CommitTableRequest

@nastra nastra force-pushed the update-requirements-refactoring branch from 7fecb46 to cb859f6 Compare June 12, 2023 15:40
old: "method java.util.List<org.apache.iceberg.rest.requests.UpdateTableRequest.UpdateRequirement>\
\ org.apache.iceberg.rest.requests.UpdateTableRequest::requirements()"
new: "method java.util.List<org.apache.iceberg.UpdateRequirement> org.apache.iceberg.rest.requests.UpdateTableRequest::requirements()"
justification: "Accepted src API break by moving UpdateTableRequest out of REST"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only an API breaking change if we return a different implementation. Since we are still returning UpdateTableRequest.UpdateRequirement, I think it is not going to be a binary-breaking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, this is not binary-breaking: SOURCE: BREAKING, BINARY: NON_BREAKING

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think it would just be better if the justification were a little more clear that we don't expect it to break anything.

.addAll(createChanges)
.addAll(metadata.changes())
.build();
requirements = UpdateRequirements.forCreateTable(updates);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

baseChanges.forEach(requestBuilder::update);
metadata.changes().forEach(requestBuilder::update);
UpdateTableRequest request = requestBuilder.build();
UpdateTableRequest request = new UpdateTableRequest(requirements, updates);
Copy link
Contributor

@rdblue rdblue Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other request classes use a builder. I think that the intent here is to use the public constructor because the builder changed substantially and can't be updated to work for both. It would be nice to use a builder later, but I can see why this needs to be public and used at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly, the builder changed substantially. In the long run I think we can add a builder again

.hasSize(2)
.hasOnlyElementsOfTypes(
UpdateRequirement.AssertTableUUID.class,
UpdateRequirement.AssertLastAssignedFieldId.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that these tests are sufficient. The assertions must be based on the base metadata's state. So the AssertTableUUID must contain the base metadata's UUID. And the last assigned field ID assertion should contain the base metadata's last column ID. I think the implementation is correct, but this would not catch some bugs introduced by updating the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense and I've updated all tests to check values. I've also added more tests that check the failure conditions. Now we have full test coverage of all code branches in UpdateRequirement via TestUpdateRequirements

@rdblue
Copy link
Contributor

rdblue commented Jun 13, 2023

@nastra, this is looking really close. I think the only issue is validating that the assertions are created correctly from the base table state, not just that the correct type of requirements are created. I guess this was being tested before by the REST catalog tests since these were embedded, but now that this is stand-alone logic I'd prefer to finish the tests. Also, we should make sure that there are tests for all of the paths in UpdateRequirements including when requirements are omitted because the table state has already changed (for example, two SetCurrentSchema updates).

@nastra nastra force-pushed the update-requirements-refactoring branch from cb859f6 to b617f0c Compare June 13, 2023 13:16
@nastra nastra force-pushed the update-requirements-refactoring branch from b617f0c to 7482842 Compare June 14, 2023 06:56
@nastra nastra force-pushed the update-requirements-refactoring branch from 7482842 to 7822a1f Compare June 14, 2023 07:58
@nastra nastra requested a review from rdblue June 16, 2023 07:08
@rdblue rdblue merged commit c85ba93 into apache:master Jun 16, 2023
@rdblue
Copy link
Contributor

rdblue commented Jun 16, 2023

Thanks, @nastra!

@nastra nastra deleted the update-requirements-refactoring branch June 17, 2023 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants