Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(client): Java client with push + pull query support #5200

Merged
merged 20 commits into from
May 9, 2020

Conversation

vcrfxia
Copy link
Contributor

@vcrfxia vcrfxia commented Apr 28, 2020

Description

Still in the process of adding more test coverage but the bulk of the changes and functional tests are here.

To review, start with Client.java and the associated interfaces. Then look at the implementations, starting with ClientImpl.java. Unit/functional tests with example usage are in ClientTest.java.

Additional test coverage to come (potentially in follow-up PRs) include tests for:

  • TLS / mutual auth / basic auth
  • push query with limit clause
  • decimal and complex types in result schema

Implementation for the insert stream methods will be a separate PR. Docs will come later as well.

Testing done

Manual + unit tests.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@vcrfxia vcrfxia requested a review from a team as a code owner April 28, 2020 07:43
* @param columnName name of column.
* @return column value.
*/
Boolean getBoolean(String columnName);
Copy link
Contributor Author

@vcrfxia vcrfxia Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd originally wanted to add similar methods for getting decimals but that requires either:

  • parsing the schema to find the precision and scale
  • asking the caller to specify precision and scale in the getter

and neither seems great. Is it worth updating the endpoint to return the schema in a more structured form? Feels maybe like overkill.

I also considered adding getter methods for arrays (lists) and maps but I'm not sure how valuable those methods would be without parsing the schema for specific subtypes.

Copy link
Contributor

@purplefox purplefox Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think most users care too muchabout the precision of a decimal. I think we should just return a BigDecimal for a DECIMAL column with whatever scale and precision is appropriate for the value. This is what JDBC does btw https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getBigDecimal(int)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a struct - we can just return that as a JsonObject.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I don't think many users will care about schema either. In the vast majority of cases the back end schema will be well known and fixed, and the developer will know this when issuing queries and doing stuff with the results. Very rarely will be a front end be coding against a completely dynamic and unknown back end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a struct - we can just return that as a JsonObject.

To continue an offline discussion: we previously thought it made sense to use vanilla Java types (List, Map) instead of Vert.x types (JsonArray, JsonObject) in the client interfaces so that apps using the client don't need Vert.x as a dependency, but more recently you said maybe it makes sense to use the Vert.x types for better type safety.

This still feels strange to me, though. It's one thing to give users the option to provide their own Vert.x instance and/or Vert.x HttpClientOptions, but requiring the use of Vert.x types in order to use the client at all feels odd. IMO it feels fine to expose the fact that the client uses Vert.x under the hood in order to benefit users seeking more advanced use cases, but requiring the use of Vert.x for all client use cases seems unwarranted.

A third option could be to wrap the Vert.x types but that seems like overkill.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think it's ok to expose the Vert.x types, or if you prefer to wrap them in our own type and delegate internally. Either way really. But I think it would be a shame to lose the functionality that those classes have which I think would be useful to users. E.g. type safe getters, easy conversion to JSON string, conversion to buffers etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in KLIP-26, I plan to introduce types that wrap the Vert.x types. Stand by for a follow-up PR with this change.

Copy link
Contributor

@purplefox purplefox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! The client certainly seems to be taking shape :)
A few comments, nothing major, the overall shape looks good and moving in the right direction.


<dependencies>
<dependency>
<groupId>io.confluent.ksql</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok for now. But before we ship I think it's important that we don't depend on any ksqlDB server side stuff - otherwise a whole load of dependencies will be pulled into the client jar, which will it hard to use by users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a look. The main dependencies right now are:

  • QueryResponseMetadata, which is used by QueryResponseHandler in order to deserialize the object
  • BufferedPublisher, which is extended by QueryResultImpl
  • BaseSubscriber, which is extended by PollableSubscriber

What's your recommendation for removing these dependencies? I see the value in not having the client depend on any of the server modules but I also don't think it makes sense to duplicate these classes. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need to factor out the reactive streams base classes into their own module - e.g. "reactive-common" and have both the server and client depend on that.
For QueryResponseMetaData we could follow the pattern of the old REST API and put the shared classes in their own package (i.e. like rest-model). Or perhaps we should use rest-model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got'cha, this makes a lot of sense. Will do in a follow-up PR.


void close();

static Client create(ClientOptions clientOptions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice this method is not used in tests. I think it's better to use the interface method to create a client than directly instantiating ClientImpl. This enables us to change the implementation more easily without breaking clients. We should consider hiding the constructor of ClientImpl (e.g. making package protected or private and indirecting through a factory)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also add a version of create that takes a Vertx instance. This allows the client to use any existing Vert.x the user might already be using in their app, thus alllowing thread pools to be reused etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on the motivation for using the interface method over directly instantiating ClientImpl. I've updated the tests, and also applied the analogous change to ClientOptionsImpl.

I'm not seeing a way to make the constructor for ClientImpl package private, though. Client.java is in a different package from ClientImpl.java so if Client.java is able to instantiate ClientImpl, then ClientImpl must have a public method for instantiation, whether that's a constructor or a factory method. How do people usually work around this?

I'd also add a version of create that takes a Vertx instance.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look how it's done in Vert.x https://github.com/eclipse-vertx/vert.x/blob/master/src/main/java/io/vertx/core/Vertx.java#L86

Basically the interface uses a static factory instance to actually create the implemention. The ServiceHelper is used to load the factory at run-time by scanning the classpath for implementations of the factory. The factory itself is in the same package as VertxImpl so the VertxImpl constructor can be package protected. It's a bit convoluted and may be overkill for us right now, might be sufficient to not worry about hiding the constructor but perhaps adding javadoc to it saying it should not be instantiated directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm interesting, the VertxImpl constructor is package protected but the factory implementation (in the same package) is still public: https://github.com/eclipse-vertx/vert.x/blob/3.8/src/main/java/io/vertx/core/impl/VertxFactoryImpl.java#L23
so if a user wanted to circumvent the intent of the VertxImpl constructor being private they could still do so by calling new VertxImplFactory().vertx(), right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but the intention here isn't to protect against malicious users - it's pretty trivial to construct any object, even if it has an inaccessible private/protected/package protected constructor, using reflection. The idea is to nudge users to the right API :)


package io.confluent.ksql.api.client;

public interface ClientOptions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also expose the Vert.x HttpClientOptions? There are probably other settings (e.g. pool size) that users might want to tweak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the expected behavior if a user provides HttpClienttOptions: will ClientOptionsImpl update the provided HttpClienttOptions according to the other fields, or will the user-provided HttpClientOptions be used directly?

The latter doesn't seem very user-friendly since then the user would be responsible for duplicating the work of ClientImpl in populating HttpClientOptions based on ClientOptions, but the former also seems confusing since the user would have to know which HttpClientOptions properties will be overridden and which won't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably expose the HttpClientOptions directly and not have similar methods on ClientOptions at all. I.e. only have options on ClientOptions if they're not related to HTTP. If you're not comfortable exposing the HttpClientOptions directly you could wrap them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. only have options on ClientOptions if they're not related to HTTP.

What counts as "not related to HTTP"? Of the options exposed so far (host, port, useTls, trustStore, keyStore, and basicAuth), all of them have equivalents in the Vert.x HttpClientOptions besides basicAuth. If we were to only expose HttpClientOptions and not have similar options on ClientOptions then ClientOptions would become

public interface ClientOptions {

  ClientOptions setBasicAuthCredentials(String username, String password);

  boolean isUseBasicAuth();

  String getBasicAuthUsername();

  String getBasicAuthPassword();

  ClientOptions copy();

  static ClientOptions create(HttpClientOptions httpClientOptions) {
    return new ClientOptionsImpl(httpClientOptions);
  }

Is this what you're proposing? I feel like I've misunderstood.

If you're not comfortable exposing the HttpClientOptions directly you could wrap them.

I assume this means creating a wrapper type around HttpClientOptions, rather than wrapping the individual methods of HttpClientOptions into ClientOptions (as exposing too many options in ClientOptions feels like it'll overwhelm the user). If we were to create a wrapper type around HttpClientOptions, would it not be better to leave the more commonly used methods in ClientOptions itself (as is currently the case) and only wrap the other options in HttpClientOptions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to change my mind on this one. I think it's ok to just use our own ClientOptions and not to expose HttpClientOptions. If we find we need to expose more properties over time on ClientOptions we can do that.

(Aside: BTW I think we should support token auth on the client too, not just basic auth)

* @param columnName name of column.
* @return column value.
*/
Boolean getBoolean(String columnName);
Copy link
Contributor

@purplefox purplefox Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think most users care too muchabout the precision of a decimal. I think we should just return a BigDecimal for a DECIMAL column with whatever scale and precision is appropriate for the value. This is what JDBC does btw https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getBigDecimal(int)

}

@Test
public void shouldNotSubscribeIfPolling() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho I would like to see these kinds of tests conducted using the actual API and no mocks.

As you know I'm not a fan of fine grained unit tests and mocks as they can constrain the implementation, and often what you're testing doesn't really correspond to what the system really does thus resulting in bugs slipping through and a false sense of security.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add equivalent tests to ClientTest as part of revamping / improving test coverage. My vote is to leave these unit tests in place, though, until we see them become brittle. I think it's useful to be able to scan a test file and understand the key pieces of functionality for a class without having to dig through integration tests. Though I guess Java docs on the class/interface itself should serve this purpose in most cases, so maybe that's not a good reason...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as there's equivalent test coverage on the actual API I think that's fine. And as long as you won't be upset if I end up deleting them after spending an hour trying to refactor them if the implementation changes ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: going to add these additional tests in a follow-up PR.

And as long as you won't be upset if I end up deleting them after spending an hour trying to refactor them if the implementation changes ;)

Fine by me ;)

import org.junit.Before;
import org.junit.Test;

public class RowImplTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not a fan of fine grained unit tests. I'd prefer to see the behaviour of a row tested on instance of Row interface returned from the actual API rather than the particular implementation RowImpl. If we later change the implementation these kinds of tests get very brittle and hard to refactor whereas tests that test against the interface don't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

Copy link
Contributor

@purplefox purplefox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as overall approach looks sound, on basis that review comments are addressed (or not addressed if my comment does not make sense - let's discuss ;) )

Copy link
Contributor Author

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @purplefox -- super helpful comments and suggestions! Responded inline.


<dependencies>
<dependency>
<groupId>io.confluent.ksql</groupId>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a look. The main dependencies right now are:

  • QueryResponseMetadata, which is used by QueryResponseHandler in order to deserialize the object
  • BufferedPublisher, which is extended by QueryResultImpl
  • BaseSubscriber, which is extended by PollableSubscriber

What's your recommendation for removing these dependencies? I see the value in not having the client depend on any of the server modules but I also don't think it makes sense to duplicate these classes. What do you think?


void close();

static Client create(ClientOptions clientOptions) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on the motivation for using the interface method over directly instantiating ClientImpl. I've updated the tests, and also applied the analogous change to ClientOptionsImpl.

I'm not seeing a way to make the constructor for ClientImpl package private, though. Client.java is in a different package from ClientImpl.java so if Client.java is able to instantiate ClientImpl, then ClientImpl must have a public method for instantiation, whether that's a constructor or a factory method. How do people usually work around this?

I'd also add a version of create that takes a Vertx instance.

Done.


package io.confluent.ksql.api.client;

public interface ClientOptions {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the expected behavior if a user provides HttpClienttOptions: will ClientOptionsImpl update the provided HttpClienttOptions according to the other fields, or will the user-provided HttpClientOptions be used directly?

The latter doesn't seem very user-friendly since then the user would be responsible for duplicating the work of ClientImpl in populating HttpClientOptions based on ClientOptions, but the former also seems confusing since the user would have to know which HttpClientOptions properties will be overridden and which won't.

}

@Test
public void shouldNotSubscribeIfPolling() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add equivalent tests to ClientTest as part of revamping / improving test coverage. My vote is to leave these unit tests in place, though, until we see them become brittle. I think it's useful to be able to scan a test file and understand the key pieces of functionality for a class without having to dig through integration tests. Though I guess Java docs on the class/interface itself should serve this purpose in most cases, so maybe that's not a good reason...

import org.junit.Before;
import org.junit.Test;

public class RowImplTest {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

Copy link
Contributor Author

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the reviews (and patient explanations) @purplefox !

I've addressed the majority of comments and am planning to merge this PR once the build passes. Follow-up PRs will contain:

  • additional test coverage: error cases, functionality currently only tested via unit tests, TLS mutual auth, result schemas with decimal and complex types, QueryResultImpl#isComplete()
  • remove dependency on ksqlDB server module
  • introduction of structured types (to wrap Vert.x JsonObject and JsonArray), and Row#getDecimal() methods
  • expose Vert.x HttpClientOptions, pending discussion in feat(client): Java client with push + pull query support #5200 (comment)
  • other interface changes resulting from the discussion in KLIP-26

* @param columnName name of column.
* @return column value.
*/
Boolean getBoolean(String columnName);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in KLIP-26, I plan to introduce types that wrap the Vert.x types. Stand by for a follow-up PR with this change.

}

@Test
public void shouldNotSubscribeIfPolling() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: going to add these additional tests in a follow-up PR.

And as long as you won't be upset if I end up deleting them after spending an hour trying to refactor them if the implementation changes ;)

Fine by me ;)


@Override
public boolean isComplete() {
return false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got'cha. I've renamed a couple internal variables in BufferedPublisher to better reflect this. Thanks for the clarification!


ClientOptions setUseClientAuth(boolean useClientAuth);

ClientOptions setVerifyHost(boolean verifyHost);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These additional TLS options were needed to get the tests working. In a follow-up PR they'll either be removed in favor of exposing Vert.x HttpClientOptions (pending discussion in #5200 (comment)) or I'll refactor all the TLS options into a separate interface in order to clean up this one.

@vcrfxia vcrfxia changed the title feat: Java client with push + pull query support feat(client): Java client with push + pull query support May 9, 2020
@vcrfxia vcrfxia merged commit 280ef0c into confluentinc:master May 9, 2020
@vcrfxia vcrfxia deleted the java-client branch May 9, 2020 06:40
* @param timeUnit unit for timeout param.
* @return the row, if available; else, null.
*/
Row poll(long timeout, TimeUnit timeUnit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vcrfxia As mentioned on the KLIP: why do we not use Duration instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the bump. I've updated the KLIP and will update the code in a future PR (along with a multitude of other feedback from the KLIP).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants