-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] REST Based Catalog #3424
[WIP] REST Based Catalog #3424
Conversation
7376d76
to
95f916e
Compare
api/src/main/java/org/apache/iceberg/exceptions/AuthorizationDeniedException.java
Outdated
Show resolved
Hide resolved
|
||
} | ||
|
||
private CreateNamespaceRequest(String namespaceName, Map<String, String> properties) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Catalog
API allows a namespace that is any number of levels. Should this be Namespace
, List<String>
, or String[]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to use Namespace
to me, but in the JSON, I'd prefer it to be a List<String>
or String[]
so users can specify as many levels as they'd like.
Will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will come back and update these things after we discuss the proposed specification 👍
*/ | ||
public class CreateNamespaceRequest implements Serializable { | ||
|
||
// TODO - Use protected so users can extend this for their own impls. Or an interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are user implementations to be supported? I was thinking that we would treat the REST API as a spec so it wouldn't necessarily make sense to allow customization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I actually meant more what you said (about it as a spec). Bad phrasing on my part. I have been working through various versions and different code pathways, and ultimately decided to introduce this initially with CreateNamespaceRequest
to open discussion.
I'll update this to be more clear that this designed to be a spec.
*/ | ||
public class CreateNamespaceResponse { | ||
private Namespace namespace; | ||
private Map<String, String> properties; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these the properties that the namespace has after create? For example, this may include a "created-at" property that was added by the server automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that was my thinking (about server side additions). I was thinking that this would be either:
- Properties the server added, such as
created-at
. - The entire properties of the namespace, including the ones the user provided, to give a full picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I meant #2. All properties of the namespace, including the ones the user sent and the ones that were added automatically. We should document this in the class Javadoc.
} | ||
|
||
void setNamespaceName(String name) { | ||
this.namespaceName = name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why include setters and a builder? Should this be immutable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included setters and getters so that it would be a proper bean for serialization purposes.
I admittedly don't love them. Going through, I see that the Aliyun is using springboot and annotations instead directly in their code.
Open to suggestions / ideas around that. And possibly this ultimately won't wind up needing to be a bean at all (not a huge fan of beans in general though I recognize they have their time and place - though that might not be now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, looks like this is probably needed because there is no serializer and it needs to adhere to the bean interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this depends on how we want to specify the protocol. I prefer documenting the JSON objects that will be sent, which probably means going through and creating serializers/deserializers for each object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I also prefer documenting the JSON objects. I had thought that the getters and setters were still needed for the serializers though.
But I'm relatively green with the Apache HTTP library that I chose for the reference calls, so maybe I can get rid of this.
I'll play around with it, but definitely agreed that I prefer to specify the protocol and to keep it language agnostic (e.g. so people can use another language on the server side in addition to the clarity that documenting the JSON objects brings over the Java classes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admittedly, as I come from more of a Scala background (partcularly when implementing REST API clients), in some places I'm not sure what needs to be a bean and what doesn't for serialization. I've typically relied on case classes, which can pretty effortlessly be converted to and from JSON with much work, or more modern over the network serialization standards such as protobuf, thrift, or even Avro.
So some things might presently be beans that don't need to be. I'm happy to clean that up as I go along and come to be more familiar with some of these java bean related things =)
} | ||
|
||
@Override | ||
protected String defaultWarehouseLocation(TableIdentifier tableIdentifier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that we want to implement BaseMetastoreCatalog
, since it has conventions that we don't need, like this method. This method is a way for the catalog to determine the default location for a table. But shouldn't this be done on the server side for a REST-based catalog? That makes the most sense to me because we want to keep the client as simple as possible and do all of the customization in the server that is easier to change. Deployed client Jars are hard to update!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. For the REST catalog, especially as a spec, the server should have a bit more control over what's done. I had a version that didn't use BaseMetastoreCatalog
(and still do), but decided to forgo it initially given the deviation from the norm.
I'll work on switching it back as I'm fleshing out ideas for the spec.
.withProperties(metadata) | ||
.build(); | ||
|
||
String path = properties.getOrDefault("create-namespace-path", "databases"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the default here need a starting /
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I don't think it does, but I will add in a check.
Even if it doesn't, it seems a bit weird to have a path that doesn't start with a /
. So I'll add some normalization code (or use some from elsewhere) to allow for that.
import java.io.Closeable; | ||
|
||
/** | ||
* Eventual Interface for pluggable RestClients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to allow customizing the HTTP client implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's what I mean. People might have a preference on the HTTP client implementation. Particularly if they use HTTP2 or HTTP3 even. So long as it conforms to the spec, it should in theory be fine.
But that was my thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can actually learn a bit from the HTTP client in AWS SDK v2, it supports plugging in any HTTP client implementation. We can borrow the same design pattern here.
* Exception thrown by the REST client and catalog, to wrap a checked exception | ||
* in an unchecked exception. | ||
*/ | ||
public class UncheckedRestException extends RuntimeException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't the other rest exception handle checked exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. Good catch. I had a few different branches where I was working out some possible ideas, and somehow I managed to leave both of those in.
I'll remove this one. 👍
}; | ||
} | ||
|
||
static String getResponseBody(CloseableHttpResponse response) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be private? Also, try not to use get
in names. If you can't replace it with something more specific (like find
, fetch
, create
, etc.) then it's likely that it is just a filler word that can be omitted.
icebergException = response.getHeader("X-Iceberg-Exception").getValue(); | ||
} catch (Exception e) { | ||
// TODO - Better error message and handling. Will be refactoring this anyway. | ||
LOG.error("Encountered an error when getting the X-Iceberg-Exception header", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using a header, what about having a way to encode an exception in the JSON payload? That seems to be the approach outlined here: https://www.baeldung.com/rest-api-error-handling-best-practices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. That's more in line with how I've previously done it as well. The Facebook example is most in line with what I've previously used, where there's a dedicted error
payload in the response body and then possibly even further error codes beyond just the normal 4xx, and those error codes map to what happened.
{
"error": {
"message": "Missing Bearer header.",
"type": "OAuthException",
"code": 40101, // We document what 401xx error codes correspond to as part of the spec
"trace_id": "AWswcVwbcqfgrSgjG80MtqJ"
},
data: {
// Successful response payload
}
}
I will refactor to use an approach like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I'm not sure that we need error codes, but we can always add them if people find them useful.
public static class NamespaceSerializer extends JsonSerializer<Namespace> { | ||
@Override | ||
public void serialize(Namespace namespace, JsonGenerator gen, SerializerProvider serializers) throws IOException { | ||
gen.writeString(namespace.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should probably be a list of strings to avoid losing the hierarchy when .
is in one of the namespace identifiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agreed. Let's sort it out in the spec for a bit and then make the necessary updates to the sereializers.
As mentioned in the community sync up, given that Trino needs some kind of JSON representation, I'm also going to meet with Jack to discuss and get his expertise on it too. 🙂
|
||
public static class NamespaceDeserializer extends JsonDeserializer<Namespace> { | ||
@Override | ||
public Namespace deserialize(JsonParser p, DeserializationContext ctxt) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update these to use full word names? It's generally more readable to use parser
rather than p
. Also, I don't think that it is any easier to type ctxt
than to type context
, and it is definitely less readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agreed. After rebasing, I encountered a lot of HiddenField checkstyle errors and probably went a little overboard on changing the names.
public static class SchemaSerializer extends JsonSerializer<Schema> { | ||
@Override | ||
public void serialize(Schema schema, JsonGenerator gen, SerializerProvider serializers) | ||
throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually throw IOException
? If not, can we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this does throw IOException
. Specifically, SchemaParser.toJson
does.
I can wrap it with the UncheckedIOException
or something else though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, my mistake. The serialize
method itself declares the IOException
. It looks like there's a delegate that can be used, so either I'll figure out the existing delegate or wrap with my own to avoid all of the checked exceptions if possible.
public SortOrder deserialize(JsonParser p, DeserializationContext context) | ||
throws IOException { | ||
JsonNode jsonNode = p.getCodec().readTree(p); | ||
Schema schema = (Schema) context.getAttribute("schema"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What writes the schema into the sort order? It looks like these are going to require some context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SortOrderParser.fromJson(Schema schema, JsonNode node)
call that returns the final SortOrder
is what writes the schema into it.
Though I do feel it could be made much easier to grok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That method constructs a SortOrder
for a given schema that must be passed in. It is on the deserialization side, not the serialization side. For the write side, SortOrderParser.toJson
produces JSON but doesn't write the schema into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback Ryan. I will use it to iterate on this. As mentioned, I had a few different ideas for the spec that I was working on, so I will clean up and then add in some of those as well.
String getNamespaceName() { | ||
return namespaceName; | ||
} | ||
|
||
void setNamespaceName(String name) { | ||
this.namespaceName = name; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are beans for serialization purposes (typically we avoid the word get
in methods as usually there's a more descriptive verb).
There's code gen tools, i.e. lombock
, but for something that is to be used as part of an AP / spec, it might be wise to avoid that if possible. Very open to suggestions on that idea.
*/ | ||
public class CreateNamespaceRequest implements Serializable { | ||
|
||
// TODO - Use protected so users can extend this for their own impls. Or an interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I actually meant more what you said (about it as a spec). Bad phrasing on my part. I have been working through various versions and different code pathways, and ultimately decided to introduce this initially with CreateNamespaceRequest
to open discussion.
I'll update this to be more clear that this designed to be a spec.
} | ||
|
||
void setNamespaceName(String name) { | ||
this.namespaceName = name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included setters and getters so that it would be a proper bean for serialization purposes.
I admittedly don't love them. Going through, I see that the Aliyun is using springboot and annotations instead directly in their code.
Open to suggestions / ideas around that. And possibly this ultimately won't wind up needing to be a bean at all (not a huge fan of beans in general though I recognize they have their time and place - though that might not be now).
*/ | ||
public class CreateNamespaceResponse { | ||
private Namespace namespace; | ||
private Map<String, String> properties; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that was my thinking (about server side additions). I was thinking that this would be either:
- Properties the server added, such as
created-at
. - The entire properties of the namespace, including the ones the user provided, to give a full picture.
} | ||
|
||
@Override | ||
protected String defaultWarehouseLocation(TableIdentifier tableIdentifier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. For the REST catalog, especially as a spec, the server should have a bit more control over what's done. I had a version that didn't use BaseMetastoreCatalog
(and still do), but decided to forgo it initially given the deviation from the norm.
I'll work on switching it back as I'm fleshing out ideas for the spec.
.withProperties(metadata) | ||
.build(); | ||
|
||
String path = properties.getOrDefault("create-namespace-path", "databases"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I don't think it does, but I will add in a check.
Even if it doesn't, it seems a bit weird to have a path that doesn't start with a /
. So I'll add some normalization code (or use some from elsewhere) to allow for that.
import java.io.Closeable; | ||
|
||
/** | ||
* Eventual Interface for pluggable RestClients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's what I mean. People might have a preference on the HTTP client implementation. Particularly if they use HTTP2 or HTTP3 even. So long as it conforms to the spec, it should in theory be fine.
But that was my thinking.
* Exception thrown by the REST client and catalog, to wrap a checked exception | ||
* in an unchecked exception. | ||
*/ | ||
public class UncheckedRestException extends RuntimeException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. Good catch. I had a few different branches where I was working out some possible ideas, and somehow I managed to leave both of those in.
I'll remove this one. 👍
} | ||
} | ||
|
||
public static class TableMetadataSerializer extends JsonSerializer<TableMetadata> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need a deserializer as well. Currently that depends on having a metadata file. I have a few ideas around this and will push something during my next round of clean up on this. Possibly placing it in a separate PR as well.
Eventually, I plan on breaking this into smaller PRs to be more easily digested, but this is to provide a larger picture overview.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there's no need to only read from a metadata file.
public <T> T execute( | ||
Method method, String path, Object body, Class<T> responseType, | ||
Consumer<CloseableHttpResponse> errorHandler) { | ||
HttpUriRequestBase request = new HttpUriRequestBase(method.name(), URI.create(String.format("%s/%s", baseUrl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: Prefer line wrapping as high in the parse tree as possible. Here, wrapping within the String.format
within URI.create
within new HttpUriRequestBase
makes it hard to distinguish what level path
is an argument to. Instead, wrapping each arg to a new line or just adding a newline for URI.create(...)
is more readable.
try { | ||
request.setEntity(new StringEntity(mapper.writeValueAsString(body))); | ||
} catch (JsonProcessingException e) { | ||
throw new UncheckedRestException(e, "Failed to write request body"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to have an exception that distinguishes problems before sending the request, like this one.
|
||
try (CloseableHttpResponse response = httpClient.execute(request)) { | ||
if (response.getCode() != HttpStatus.SC_OK) { | ||
errorHandler.accept(response); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return after handling the response in case it doesn't throw an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I decided to use the Apache HTTP library, as we're in an Apache project. But I'm more familiar with other HTTP libraries. I will return here and then consider more about libraries etc. I'm a big fan of OkHTTP, but I recognize that might not be in-line with the fact that this is an Apache repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use whatever we want as long as the license is compatible. That said, if the library is already included with Spark and Flink, we generally prefer to use it so we don't need to shade and relocate it. That's why Apache HTTP is most often used.
HttpEntity entity = response.getEntity(); | ||
return mapper.readValue(entity.getContent(), responseType); | ||
} catch (IOException e) { | ||
throw new UncheckedIOException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What throws the IOException
? Can we add context about what failed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can add context. Currently, I believe it's either mapper.readValue
or response.getEntity
that throws it.
I'll double check and then add a note. I can also wrap and throw individually if multiple things are throwing it.
return mapper.readValue(entity.getContent(), responseType); | ||
} catch (IOException e) { | ||
throw new UncheckedIOException(e); | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something throws Exception
?
private Consumer<CloseableHttpResponse> defaultErrorHandler = ErrorHandlers.databaseErrorHandler(); | ||
|
||
private Builder() { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: no blank line in empty methods.
private Configuration hadoopConf; | ||
private FileIO fileIO; | ||
|
||
// TODO - Refactor to use interface, which doesn't currently have POST etc embedded into it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it seems like you already know enough about the methods in HttpClient
to fill out the interface.
/** | ||
* Represents a REST request to create a namespace / database. | ||
*/ | ||
public class CreateNamespaceRequest implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Serializable
? Is this needed for the JSON bean serialization?
afc2862
to
bcffe95
Compare
* Represents a REST request to create a namespace / database. | ||
* | ||
* Possible example POST body (outer format object possibly only used on Response): | ||
* { "data": { "namespace": ["prod", "accounting"], properties: {} }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it's actually as follows, without using an array for Namespace
.
{ "namespace": "prod.accounting", "properties": { "key1": "value1" } }
Now that I have the mock server going, I'm going to enter an OpenAPI spec and then run the tests over the spec (as I've done for a create namespace based one).
Will put the spec in this PR as a separate document.
4d05922
to
89d3cec
Compare
…uld be closed by the catalog
… from other branches
1e0d30f
to
5b74cb5
Compare
* structure, including the possibility of richer metadata on errors, such as tracing telemetry or follw | ||
* up user instruction. | ||
* | ||
* { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need <pre>
HTML tag for these code blocks to show properly in javadoc.
|
||
// TODO - Possibly authenticate with the server initially and then have the server return some of this information | ||
Preconditions.checkNotNull( | ||
properties.getOrDefault("baseUrl", null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to reuse CatalogProperties.URI
for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's a good call. Will update.
this.baseUrl = properties.get("baseUrl"); | ||
this.fileIO = initializeFileIO(properties); | ||
|
||
this.mapper = new ObjectMapper(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is recommended to use object mapper as a singleton. We can reuse JsonUtil.mapper()
and add util methods around that to register modules.
And actually why not directly register the serdes in that mapper directly, instead of only registering it when initializing the REST catalog? I think it does not hurt to do that even for people not using REST catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would certainly make it simpler. And I don't see the harm in registering the serdes directly, which would make a lot of things simpler. Good suggestion. 👍
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
public class RestCatalog extends BaseMetastoreCatalog implements Closeable, SupportsNamespaces, Configurable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably better to directly implement it from interface instead of extending the base implementation. All the calls are delegated to the server anyway, there is not much benefit I see in inheriting the base logic. Some things like default location are really just util methods but not formalized idea that should not be inherited for REST catalog implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I am in agreement with that, BaseMetastoreCatalog
doesn't provide much value as mentioned.
I stepped away from this PR for a bit to just focus on fleshing out the spec in the other PR. I will refactor to remove BaseMetastoreCatalog
from here.
.build(); | ||
|
||
// TODO - This should come from the server side. | ||
String path = properties.getOrDefault("create-namespace-path", "namespace"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is left over from a previous idea I was toying with.
I will remove this and others like it. It would be pretty hard to deal with it from the iceberg library if users were rewriting the path, given that it contains path variables and things at times. This is just an artifact I forgot to remove. Good catch.
* Use true immutable objects that are type-safe, thread-safe, null-safe * Get builder classes for free * Get JSON serialization/deserialization for free (this will also be helpful for the REST-based Catalog apache#3424) This is relying on https://immutables.github.io/ (Apache License 2.0), which allows generating immutable objects and builders via annotation processing. * Immutable objects are serialization ready (including JSON and its binary forms) * Supports lazy, derived and optional attributes * Immutable objects are constructed once, in a consistent state, and can be safely shared * Will fail if mandatory attributes are missing * Cannot be sneakily modified when passed to other code * Immutable objects are naturally thread-safe and can therefore be safely shared among threads * No excessive copying * No excessive synchronization * Object definitions are pleasant to write and read * No boilerplate setter and getters * No ugly IDE-generated hashCode, equals and toString methods that end up being stored in source control.
* Use true immutable objects that are type-safe, thread-safe, null-safe * Get builder classes for free * Get JSON serialization/deserialization for free (this will also be helpful for the REST-based Catalog apache#3424) This is relying on https://immutables.github.io/ (Apache License 2.0), which allows generating immutable objects and builders via annotation processing. * Immutable objects are serialization ready (including JSON and its binary forms) * Supports lazy, derived and optional attributes * Immutable objects are constructed once, in a consistent state, and can be safely shared * Will fail if mandatory attributes are missing * Cannot be sneakily modified when passed to other code * Immutable objects are naturally thread-safe and can therefore be safely shared among threads * No excessive copying * No excessive synchronization * Object definitions are pleasant to write and read * No boilerplate setter and getters * No ugly IDE-generated hashCode, equals and toString methods that end up being stored in source control.
* Use true immutable objects that are type-safe, thread-safe, null-safe * Get builder classes for free * Get JSON serialization/deserialization for free (this will also be helpful for the REST-based Catalog apache#3424) This is relying on https://immutables.github.io/ (Apache License 2.0), which allows generating immutable objects and builders via annotation processing. * Immutable objects are serialization ready (including JSON and its binary forms) * Supports lazy, derived and optional attributes * Immutable objects are constructed once, in a consistent state, and can be safely shared * Will fail if mandatory attributes are missing * Cannot be sneakily modified when passed to other code * Immutable objects are naturally thread-safe and can therefore be safely shared among threads * No excessive copying * No excessive synchronization * Object definitions are pleasant to write and read * No boilerplate setter and getters * No ugly IDE-generated hashCode, equals and toString methods that end up being stored in source control.
* Use true immutable objects that are type-safe, thread-safe, null-safe * Get builder classes for free * Get JSON serialization/deserialization for free (this will also be helpful for the REST-based Catalog apache#3424) This is relying on https://immutables.github.io/ (Apache License 2.0), which allows generating immutable objects and builders via annotation processing. * Immutable objects are serialization ready (including JSON and its binary forms) * Supports lazy, derived and optional attributes * Immutable objects are constructed once, in a consistent state, and can be safely shared * Will fail if mandatory attributes are missing * Cannot be sneakily modified when passed to other code * Immutable objects are naturally thread-safe and can therefore be safely shared among threads * No excessive copying * No excessive synchronization * Object definitions are pleasant to write and read * No boilerplate setter and getters * No ugly IDE-generated hashCode, equals and toString methods that end up being stored in source control.
I'm closing this as:
I'll be opening a new PR (and ideally splitting it off into sub-PRs for easier digestion) ASAP. Was out sick for a week or so after the initial spec was reviewed. |
* Use true immutable objects that are type-safe, thread-safe, null-safe * Get builder classes for free * Get JSON serialization/deserialization for free (this will also be helpful for the REST-based Catalog apache#3424) This is relying on https://immutables.github.io/ (Apache License 2.0), which allows generating immutable objects and builders via annotation processing. * Immutable objects are serialization ready (including JSON and its binary forms) * Supports lazy, derived and optional attributes * Immutable objects are constructed once, in a consistent state, and can be safely shared * Will fail if mandatory attributes are missing * Cannot be sneakily modified when passed to other code * Immutable objects are naturally thread-safe and can therefore be safely shared among threads * No excessive copying * No excessive synchronization * Object definitions are pleasant to write and read * No boilerplate setter and getters * No ugly IDE-generated hashCode, equals and toString methods that end up being stored in source control.
There are many users who are coming to the data space for the first time, or otherwise aren't used to used HMS.
For many of them, implementing a REST server and then choosing how to handle things on the server side is likely their preferred method. This would also allow people to plug in their own implementation around a somewhat known interface.
I'm opening this up for comment as I've begun some work on a REST catalog API, as well as an example implementation.
I'm seeking feedback and will also have a proposal eventually after it's been discussed in the community a bit. 🙂