Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: View metadata implementation #7759

Merged
merged 3 commits into from
Jun 16, 2023
Merged

Conversation

nastra
Copy link
Contributor

@nastra nastra commented Jun 3, 2023

This is a continuation PR of #6559 and I've addressed all remaining comments.
@stevenzwu @jackye1995 @rdblue since you guys were reviewing the original PR, I've added you here as well. Please take a look when you can.

@ConeyLiu
Copy link
Contributor

ConeyLiu commented Jun 4, 2023

@nastra Is it possible to abstract a common interface for table and view because they have very similar methods?

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great to me just some minor remaining comments, and should be good to merge. Thanks a ton for carrying this forward @nastra! I'll reply on the original PR and we can probably close that one.

@nastra
Copy link
Contributor Author

nastra commented Jun 6, 2023

@nastra Is it possible to abstract a common interface for table and view because they have very similar methods?

@ConeyLiu I don't think this is desirable, because ViewMetadata and TableMetadata only overlap in a few fields, so adding an abstract common interface might add unnecessary complexity

@ConeyLiu
Copy link
Contributor

ConeyLiu commented Jun 7, 2023

@nastra Thanks for your response, it indeed makes the code more complex.

return versionsById().get(versionId);
}

@Value.Lazy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be Derived rather than Lazy. Lazy makes it more complicated and there's no real benefit to not just calculating this at construction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally I agree with you that @Value.Derived is the better option.
However, given that we're deriving from a collection, it makes it difficult/impossible to do validation on the collection before eager validation on the derived field kicks in.
See also my comment in #7759 (comment) about default collection behavior and why it's better to have @Value.Lazy here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, does it make sense to make this Lazy at all? They're not lazy or derived in TableMetadata:

  public Schema schema() {
    return schemasById.get(currentSchemaId);
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok removing @Value.Lazy here, since the only thing it buys us is that it doesn't re-compute the value from the underlying hash map, which is O(1) anyway

versions().subList(versions().size() - versionHistorySizeToKeep(), versions().size());
List<ViewHistoryEntry> history =
history().subList(history().size() - versionHistorySizeToKeep(), history().size());
return ImmutableViewMetadata.builder().from(this).versions(versions).history(history).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is confusing. The builder should enforce the maximum number of versions, not the check method. It's awkward that this returns a copy of the view metadata.

Copy link
Contributor Author

@nastra nastra Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be the "official" way of normalizing an object: http://immutables.github.io/immutable.html#normalization

Not sure it's worth adding our own builder class for this particular case

Copy link
Contributor

@rdblue rdblue Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strange to me and I'm not a big fan of having a checkAndNormalize in the interface, but okay I guess? It doesn't seem concerning enough to block using Immutables but it is annoying that the pattern requires exposing additional methods that don't have a clear contract (like validate).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here completely.
It would be great if there could be a better & shorter alternative to achieve this.
The (longer) alternative would be to have our own builder that internally uses ImmutableViewMetadata.builder() and does the "normalization" but that doesn't prevent anyone from using ImmutableViewMetadata.builder() directly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but if we're using our own builder then what's the point of immutables?

For now, I think we should just get this in to unblock the next PR. We should follow up with the API that adds a new view representation and version. If that goes smoothly then it will be fine. If it doesn't fit then we can remove immutables and go with a direct implementation.

assertThatThrownBy(() -> ImmutableViewMetadata.builder().build())
.isInstanceOf(IllegalStateException.class)
.hasMessage(
"Cannot build ViewMetadata, some of required attributes are not set [formatVersion, location, currentSchemaId, currentVersionId]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't schemas and versions required to be non-null and non-empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collections are empty by default if not set and typically they are being checked in a @Value.Check method. See also immutables/immutables#1429 (comment) for some historical reasons why Collections are being empty by default if not set.

The other alternative would be to use the Java Bean Validation API, which would allow expressing validations on Collections via annotations, but I don't think we'd want to do that here.

That's the main reason why I'd like to keep currentVersion() and schema() being @Value.Lazy so that we don't have misleading error messaging.
What I mean here is that for example, if currentVersionId() is set but versions() is empty, we would see java.lang.NullPointerException: currentVersion (because currentVersion cannot be derived from versions() and thus ends up being null) rather than java.lang.IllegalArgumentException: Cannot find current version 23 in view versions: [1]

@nastra nastra force-pushed the view-metadata branch 2 times, most recently from 69d38ba to 29dd333 Compare June 12, 2023 15:07
properties(),
ViewProperties.VERSION_HISTORY_SIZE,
ViewProperties.VERSION_HISTORY_SIZE_DEFAULT);
Preconditions.checkArgument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this validation fails, won't it cause the view to be broken and unfixable? We won't be able to construct the ViewMetadata so there would be no way to fix this if another implementation sets it to -1 or something for unlimited history. We would either need to document this requirement in the spec or fail more gracefully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is true, one wouldn't be able to construct the ViewMetadata object`, because it's technically failing validation.
For keeping unlimited history I think we have the following options:

  1. allow -1 to indicate unlimited (not sure if we have any other properties that carry such semantics).
  2. let users use a large value with Integer.MAX_VALUE being the maximum. This seems to be more appropriate IMO, since the view spec mentions The number of versions to retain is controlled by the table property: version.history.num-entries.. However, I can also see the argument to "which value is large enough?".

Whatever we think carries the right semantics to indicate unlimited history should go into the validation logic + the spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would rather not fail construction because of this. If the value is invalid, then let's just ignore it, warn, and not change the versions list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable, I've updated this to issue a WARN and not modify the history

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks about ready, although there are some minor comments.

Co-authored-by: John Zhuge <[email protected]>
Co-authored-by: Eduard Tudenhoefner <[email protected]>
@rdblue rdblue merged commit f2b01f8 into apache:master Jun 16, 2023
@rdblue
Copy link
Contributor

rdblue commented Jun 16, 2023

Thanks, @nastra! And thanks to @amogh-jahagirdar and @jzhuge for major parts of this as well!

@nastra nastra deleted the view-metadata branch June 17, 2023 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants