-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: View metadata implementation #6559
Conversation
private final List<ViewHistoryEntry> versionLog; | ||
private final List<Schema> schemas; | ||
private final Map<Integer, Schema> schemasById; | ||
private final String metadataFileLocation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't store metadataFileLocation in the metadata file, I'll remove this
@RunWith(Parameterized.class) | ||
public class TestViewMetadataParser extends ParserTestBase<ViewMetadata> { | ||
private static ViewVersion version1 = | ||
BaseViewVersion.builder() | ||
.versionId(1) | ||
.timestampMillis(4353L) | ||
.addRepresentation( | ||
BaseSQLViewRepresentation.builder().query("select 'foo' foo").schemaId(1).build()) | ||
.build(); | ||
|
||
private static ViewHistoryEntry historyEntry1 = BaseViewHistoryEntry.of(4353L, 1); | ||
|
||
private static final Schema TEST_SCHEMA = | ||
new Schema( | ||
1, | ||
Types.NestedField.required(1, "x", Types.LongType.get()), | ||
Types.NestedField.required(2, "y", Types.LongType.get(), "comment"), | ||
Types.NestedField.required(3, "z", Types.LongType.get())); | ||
|
||
private static ViewVersion version2 = | ||
BaseViewVersion.builder() | ||
.versionId(2) | ||
.timestampMillis(5555L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add more tests for view metadata parsing. One thing I did just a bit differently than the original PR is move the JSON to a separate file in resources
if (node.has(SCHEMAS)) { | ||
JsonNode schemaArray = node.get(SCHEMAS); | ||
Schema currentSchema = null; | ||
Preconditions.checkArgument( | ||
schemaArray.isArray(), "Cannot parse schemas from non-array: %s", schemaArray); | ||
// current schema ID is required when the schema array is present | ||
currentSchemaId = JsonUtil.getInt(CURRENT_SCHEMA_ID, node); | ||
// parse the schema array | ||
ImmutableList.Builder<Schema> builder = ImmutableList.builder(); | ||
for (JsonNode schemaNode : schemaArray) { | ||
Schema schema = SchemaParser.fromJson(schemaNode); | ||
if (schema.schemaId() == currentSchemaId) { | ||
currentSchema = schema; | ||
} | ||
builder.add(schema); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doubt I still have is I think schemas and current schema should always be set, and thus the spec should be reflected so that it's required. Maybe there's a case that we don't want schema for the entire view and it's up to individual representations. In that case in my mind for any SQL representation in the view, the schema-id should be required (which then means there must be a schema list considering we only support SQL representations). Right now it's marked as optional in the spec
@@ -41,8 +40,8 @@ default Type type() { | |||
/** The default namespace when the view is created. */ | |||
Namespace defaultNamespace(); | |||
|
|||
/** The query output schema at version create time, without aliases. */ | |||
Schema schema(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change it to schema ID? If we know the schema ID, we should also be able to retrieve the Schema object
core/src/main/java/org/apache/iceberg/view/SQLViewRepresentationParser.java
Outdated
Show resolved
Hide resolved
I am trying see if we can break this further so that it's easier to review for a broader audience. There are some implementations of the basic interface, such as |
Agreed @jackye1995 we can break this down further for easier review. I'll raise the version, representation and history entry PRs separately and this one can be focused on the view metadata as a whole. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amogh-jahagirdar thanks for working on this. I've suggested a bunch of improvements that I think will make the code much shorter to read and easier to maintain in the long run. Also I think we need to add a few more tests around nullability in the Parsers
core/src/main/java/org/apache/iceberg/view/BaseSQLViewRepresentation.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/view/ViewHistoryEntryParser.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/view/ViewHistoryEntryParser.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/view/ViewRepresentationParser.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/view/ViewRepresentationParser.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/view/ViewVersionParser.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/view/ViewVersionParser.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/view/ViewVersionParser.java
Outdated
Show resolved
Hide resolved
Thanks @nastra I'll be taking these suggestions in all the split PRs I'm raising. Agreed, more tests on nullability/missing fields are needed, and now that we use Immutable dependency in the metrics implementation, we have a good precedent to use it here as well which will simplify a lot of the boilerplate code |
if (o == null || getClass() != o.getClass()) { | ||
return false; | ||
} | ||
BaseSQLViewRepresentation that = (BaseSQLViewRepresentation) o; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: this method needs to have line spacing applied.
|
||
@Override | ||
public String toString() { | ||
return "BaseViewDefinition{" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use a helper rather than building this by hand? It's hard to read this way. We usually base these on MoreObjects
.
core/src/main/java/org/apache/iceberg/view/SQLViewRepresentationParser.java
Outdated
Show resolved
Hide resolved
1be161b
to
6d557e9
Compare
c617b94
to
a526276
Compare
fb6ac1f
to
508ca82
Compare
Co-authored-by: John Zhuge <[email protected]>
508ca82
to
7ea7bfb
Compare
internalWrite(metadata, outputFile, false); | ||
} | ||
|
||
public static void toJson(ViewMetadata metadata, JsonGenerator generator) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check for null on metadata
+ add a test
} | ||
|
||
public static ViewMetadata fromJson(JsonNode node) { | ||
Preconditions.checkArgument(node != null, "Cannot parse view metadata from null json"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to add a test for this
static final int DEFAULT_VIEW_FORMAT_VERSION = 1; | ||
static final int SUPPORTED_VIEW_FORMAT_VERSION = 1; | ||
|
||
public static ImmutableViewMetadata.Builder buildFrom(ViewMetadata metadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used anywhere? Also we can probably just use ImmutableViewMetadata.builder().from(metadata).xyz().build()
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or is the goal here to just create a copy? In that case we can also just do ImmutableViewMetadata.copyOf(metadata)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is trying to follow TableMetadata.buildFrom()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is ImmutableViewMetadata
defined? I couldn't find it in this PR or existing code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevenzwu ImmutableViewMetadata
is a generated class from the @Value.Immutable
annotation. See also https://immutables.github.io/ for some additional details
return schemasById().get(currentSchemaId()); | ||
} | ||
|
||
private static Map<Integer, ViewVersion> indexVersions(List<ViewVersion> versions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think indexVersions
and indexSchemas
don't need to be static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't static preferred? If someone makes a change in the future, I'd rather they be static since that is more strict. I could see changing from private
to a more open visibility, in which case we would prefer static.
} | ||
} | ||
|
||
private ViewMetadataParser() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this should be at the top
import org.junit.Test; | ||
import org.junit.runners.Parameterized; | ||
|
||
public class TestViewMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be called TestViewMetadataParser
instead?
import org.apache.iceberg.types.Types; | ||
import org.assertj.core.api.Assertions; | ||
import org.junit.Assert; | ||
import org.junit.Test; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth changing this new test to using JUnit5
.currentVersionId(2) | ||
.formatVersion(1) | ||
.build(); | ||
assertSameViewMetadata(expectedViewMetadata, ViewMetadataParser.fromJson(json)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: assertSame
usually suggests that there's some sort of identity comparison between two objects.
generator.writeEndObject(); | ||
} | ||
|
||
static String toJson(ViewMetadata viewMetadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe move this before the other toJson(..)
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this is not public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jack makes a good point. A few of these should be reviewed. In addition to this, which reasonably should be public, I think that toJson(ViewMetadata, JsonGenerator)
should not be public until we need it in another package.
ViewProperties.VERSION_HISTORY_SIZE_DEFAULT); | ||
|
||
Preconditions.checkArgument( | ||
numVersionsToKeep >= 1, "%s must be positive", ViewProperties.VERSION_HISTORY_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add details of the received value of numVersionsToKeep
in error message
.build(); | ||
} | ||
|
||
static ViewMetadata fromJson(String json) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this is not public?
@@ -0,0 +1,80 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like puffin files are in path org/apache/iceberg/puffin/v1
, we should follow the same pattern
import org.immutables.value.Value; | ||
|
||
@Value.Immutable | ||
public abstract class ViewMetadata implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other view objects are interfaces. Why does this use an abstract class? Moving between interfaces and classes is a binary incompatible change, so I think that this should be an interface instead. Is that not possible for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess why this is an abstract class rather than an Interface is because of indexVersions()
and indexSchemas()
can't be made private if we use an Interface here, meaning that they would become part of the API, which I don't think we want
Preconditions.checkArgument( | ||
formatVersion <= ViewMetadata.SUPPORTED_VIEW_FORMAT_VERSION, | ||
"Cannot read unsupported version %s", | ||
formatVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: what is the value of adding this here rather than in check()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this should go into check()
, because otherwise you could create an invalid ViewMetadata
object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, @nastra fixed this in https://github.com/apache/iceberg/pull/7759/files#diff-037c118271eb576698f0829d876d5c49919eea4bd20e6b2f98b6072cfd0d4b08R133-R135 where all the validation is done in the special check method he referred to.
versions.add(version); | ||
} | ||
|
||
Preconditions.checkArgument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a lot of these checks should be in the ViewMetadata
class rather than here. Otherwise, you can construct invalid ViewMetadata
instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, @nastra fixed this in https://github.com/apache/iceberg/pull/7759/files#diff-037c118271eb576698f0829d876d5c49919eea4bd20e6b2f98b6072cfd0d4b08R133-R135 where all the validation is done in the special check method he referred to.
Preconditions.checkArgument( | ||
numVersionsToKeep >= 1, "%s must be positive", ViewProperties.VERSION_HISTORY_SIZE); | ||
|
||
if (versions.size() > numVersionsToKeep) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this here? I don't think it belongs in the parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this shouldn't be here. The better option would be to put this into a special version of a @Check
method that allows normalization. This would allow to retain the number of versions that are desired in ViewMetadata
directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, @nastra fixed this in https://github.com/apache/iceberg/pull/7759/files#diff-037c118271eb576698f0829d876d5c49919eea4bd20e6b2f98b6072cfd0d4b08R133-R135 where all the validation is done in the special check method he referred to.
static final int DEFAULT_VIEW_FORMAT_VERSION = 1; | ||
static final int SUPPORTED_VIEW_FORMAT_VERSION = 1; | ||
|
||
public static ImmutableViewMetadata.Builder buildFrom(ViewMetadata metadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is ImmutableViewMetadata
defined? I couldn't find it in this PR or existing code
} | ||
|
||
private static Map<Integer, Schema> indexSchemas(List<Schema> schemas) { | ||
if (schemas == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could schemas be null? the spec says it is optional
. but it is also says A list of schemas, the same as the ‘schemas’ field from Iceberg table spec.
It is optional to support both v1 and v2 tables?
Co-authored-by: John Zhuge [email protected]
Discussed offline with @jzhuge, builds on the core parser changes already done on #4657 but based off the latest API changes.