feat(client): Java client with push + pull query support #5200

vcrfxia · 2020-04-28T07:43:29Z

Description

Still in the process of adding more test coverage but the bulk of the changes and functional tests are here.

To review, start with Client.java and the associated interfaces. Then look at the implementations, starting with ClientImpl.java. Unit/functional tests with example usage are in ClientTest.java.

Additional test coverage to come (potentially in follow-up PRs) include tests for:

TLS / mutual auth / basic auth
push query with limit clause
decimal and complex types in result schema

Implementation for the insert stream methods will be a separate PR. Docs will come later as well.

Testing done

Manual + unit tests.

Reviewer checklist

Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
Ensure relevant issues are linked (description should include text like "Fixes #")

vcrfxia · 2020-04-28T07:48:10Z

ksqldb-api-client/src/main/java/io/confluent/ksql/api/client/Row.java

+   * @param columnName name of column.
+   * @return column value.
+   */
+  Boolean getBoolean(String columnName);


I'd originally wanted to add similar methods for getting decimals but that requires either:

parsing the schema to find the precision and scale

asking the caller to specify precision and scale in the getter

and neither seems great. Is it worth updating the endpoint to return the schema in a more structured form? Feels maybe like overkill.

I also considered adding getter methods for arrays (lists) and maps but I'm not sure how valuable those methods would be without parsing the schema for specific subtypes.

I don't think most users care too muchabout the precision of a decimal. I think we should just return a BigDecimal for a DECIMAL column with whatever scale and precision is appropriate for the value. This is what JDBC does btw https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getBigDecimal(int)

For a struct - we can just return that as a JsonObject.

In general, I don't think many users will care about schema either. In the vast majority of cases the back end schema will be well known and fixed, and the developer will know this when issuing queries and doing stuff with the results. Very rarely will be a front end be coding against a completely dynamic and unknown back end.

For a struct - we can just return that as a JsonObject.

To continue an offline discussion: we previously thought it made sense to use vanilla Java types (List, Map) instead of Vert.x types (JsonArray, JsonObject) in the client interfaces so that apps using the client don't need Vert.x as a dependency, but more recently you said maybe it makes sense to use the Vert.x types for better type safety.

This still feels strange to me, though. It's one thing to give users the option to provide their own Vert.x instance and/or Vert.x HttpClientOptions, but requiring the use of Vert.x types in order to use the client at all feels odd. IMO it feels fine to expose the fact that the client uses Vert.x under the hood in order to benefit users seeking more advanced use cases, but requiring the use of Vert.x for all client use cases seems unwarranted.

A third option could be to wrap the Vert.x types but that seems like overkill.

Personally I think it's ok to expose the Vert.x types, or if you prefer to wrap them in our own type and delegate internally. Either way really. But I think it would be a shame to lose the functionality that those classes have which I think would be useful to users. E.g. type safe getters, easy conversion to JSON string, conversion to buffers etc.

As discussed in KLIP-26, I plan to introduce types that wrap the Vert.x types. Stand by for a follow-up PR with this change.

purplefox

Looks great! The client certainly seems to be taking shape :)
A few comments, nothing major, the overall shape looks good and moving in the right direction.

purplefox · 2020-04-28T11:27:21Z

ksqldb-api-client/pom.xml

+
+    <dependencies>
+        <dependency>
+            <groupId>io.confluent.ksql</groupId>


This is ok for now. But before we ship I think it's important that we don't depend on any ksqlDB server side stuff - otherwise a whole load of dependencies will be pulled into the client jar, which will it hard to use by users.

Had a look. The main dependencies right now are:

QueryResponseMetadata, which is used by QueryResponseHandler in order to deserialize the object

BufferedPublisher, which is extended by QueryResultImpl

BaseSubscriber, which is extended by PollableSubscriber

What's your recommendation for removing these dependencies? I see the value in not having the client depend on any of the server modules but I also don't think it makes sense to duplicate these classes. What do you think?

I think we will need to factor out the reactive streams base classes into their own module - e.g. "reactive-common" and have both the server and client depend on that.
For QueryResponseMetaData we could follow the pattern of the old REST API and put the shared classes in their own package (i.e. like rest-model). Or perhaps we should use rest-model?

Got'cha, this makes a lot of sense. Will do in a follow-up PR.

purplefox · 2020-04-28T11:29:25Z

ksqldb-api-client/src/main/java/io/confluent/ksql/api/client/Client.java

+
+  void close();
+
+  static Client create(ClientOptions clientOptions) {


I notice this method is not used in tests. I think it's better to use the interface method to create a client than directly instantiating ClientImpl. This enables us to change the implementation more easily without breaking clients. We should consider hiding the constructor of ClientImpl (e.g. making package protected or private and indirecting through a factory)

I'd also add a version of create that takes a Vertx instance. This allows the client to use any existing Vert.x the user might already be using in their app, thus alllowing thread pools to be reused etc.

Good call on the motivation for using the interface method over directly instantiating ClientImpl. I've updated the tests, and also applied the analogous change to ClientOptionsImpl.

I'm not seeing a way to make the constructor for ClientImpl package private, though. Client.java is in a different package from ClientImpl.java so if Client.java is able to instantiate ClientImpl, then ClientImpl must have a public method for instantiation, whether that's a constructor or a factory method. How do people usually work around this?

I'd also add a version of create that takes a Vertx instance.

Done.

Take a look how it's done in Vert.x https://github.com/eclipse-vertx/vert.x/blob/master/src/main/java/io/vertx/core/Vertx.java#L86

Basically the interface uses a static factory instance to actually create the implemention. The ServiceHelper is used to load the factory at run-time by scanning the classpath for implementations of the factory. The factory itself is in the same package as VertxImpl so the VertxImpl constructor can be package protected. It's a bit convoluted and may be overkill for us right now, might be sufficient to not worry about hiding the constructor but perhaps adding javadoc to it saying it should not be instantiated directly.

Hm interesting, the VertxImpl constructor is package protected but the factory implementation (in the same package) is still public: https://github.com/eclipse-vertx/vert.x/blob/3.8/src/main/java/io/vertx/core/impl/VertxFactoryImpl.java#L23
so if a user wanted to circumvent the intent of the VertxImpl constructor being private they could still do so by calling new VertxImplFactory().vertx(), right?

Probably, but the intention here isn't to protect against malicious users - it's pretty trivial to construct any object, even if it has an inaccessible private/protected/package protected constructor, using reflection. The idea is to nudge users to the right API :)

purplefox · 2020-04-28T11:31:34Z

ksqldb-api-client/src/main/java/io/confluent/ksql/api/client/ClientOptions.java

+
+package io.confluent.ksql.api.client;
+
+public interface ClientOptions {


Maybe also expose the Vert.x HttpClientOptions? There are probably other settings (e.g. pool size) that users might want to tweak.

I'm trying to understand the expected behavior if a user provides HttpClienttOptions: will ClientOptionsImpl update the provided HttpClienttOptions according to the other fields, or will the user-provided HttpClientOptions be used directly?

The latter doesn't seem very user-friendly since then the user would be responsible for duplicating the work of ClientImpl in populating HttpClientOptions based on ClientOptions, but the former also seems confusing since the user would have to know which HttpClientOptions properties will be overridden and which won't.

I would probably expose the HttpClientOptions directly and not have similar methods on ClientOptions at all. I.e. only have options on ClientOptions if they're not related to HTTP. If you're not comfortable exposing the HttpClientOptions directly you could wrap them.

I.e. only have options on ClientOptions if they're not related to HTTP.

What counts as "not related to HTTP"? Of the options exposed so far (host, port, useTls, trustStore, keyStore, and basicAuth), all of them have equivalents in the Vert.x HttpClientOptions besides basicAuth. If we were to only expose HttpClientOptions and not have similar options on ClientOptions then ClientOptions would become

public interface ClientOptions { ClientOptions setBasicAuthCredentials(String username, String password); boolean isUseBasicAuth(); String getBasicAuthUsername(); String getBasicAuthPassword(); ClientOptions copy(); static ClientOptions create(HttpClientOptions httpClientOptions) { return new ClientOptionsImpl(httpClientOptions); }

Is this what you're proposing? I feel like I've misunderstood.

If you're not comfortable exposing the HttpClientOptions directly you could wrap them.

I assume this means creating a wrapper type around HttpClientOptions, rather than wrapping the individual methods of HttpClientOptions into ClientOptions (as exposing too many options in ClientOptions feels like it'll overwhelm the user). If we were to create a wrapper type around HttpClientOptions, would it not be better to leave the more commonly used methods in ClientOptions itself (as is currently the case) and only wrap the other options in HttpClientOptions?

I'm going to change my mind on this one. I think it's ok to just use our own ClientOptions and not to expose HttpClientOptions. If we find we need to expose more properties over time on ClientOptions we can do that.

(Aside: BTW I think we should support token auth on the client too, not just basic auth)

ksqldb-api-client/src/main/java/io/confluent/ksql/api/client/QueryResult.java

purplefox · 2020-04-28T11:35:00Z

ksqldb-api-client/src/main/java/io/confluent/ksql/api/client/Row.java

+   * @param columnName name of column.
+   * @return column value.
+   */
+  Boolean getBoolean(String columnName);


I don't think most users care too muchabout the precision of a decimal. I think we should just return a BigDecimal for a DECIMAL column with whatever scale and precision is appropriate for the value. This is what JDBC does btw https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getBigDecimal(int)

ksqldb-api-client/src/test/java/io/confluent/ksql/api/client/ClientTest.java

purplefox · 2020-04-28T12:25:53Z

ksqldb-api-client/src/test/java/io/confluent/ksql/api/client/impl/QueryResultImplTest.java

+  }
+
+  @Test
+  public void shouldNotSubscribeIfPolling() {


Imho I would like to see these kinds of tests conducted using the actual API and no mocks.

As you know I'm not a fan of fine grained unit tests and mocks as they can constrain the implementation, and often what you're testing doesn't really correspond to what the system really does thus resulting in bugs slipping through and a false sense of security.

Sure, I'll add equivalent tests to ClientTest as part of revamping / improving test coverage. My vote is to leave these unit tests in place, though, until we see them become brittle. I think it's useful to be able to scan a test file and understand the key pieces of functionality for a class without having to dig through integration tests. Though I guess Java docs on the class/interface itself should serve this purpose in most cases, so maybe that's not a good reason...

As long as there's equivalent test coverage on the actual API I think that's fine. And as long as you won't be upset if I end up deleting them after spending an hour trying to refactor them if the implementation changes ;)

Update: going to add these additional tests in a follow-up PR.

And as long as you won't be upset if I end up deleting them after spending an hour trying to refactor them if the implementation changes ;)

Fine by me ;)

purplefox · 2020-04-28T12:27:37Z

ksqldb-api-client/src/test/java/io/confluent/ksql/api/client/impl/RowImplTest.java

+import org.junit.Before;
+import org.junit.Test;
+
+public class RowImplTest {


Again, not a fan of fine grained unit tests. I'd prefer to see the behaviour of a row tested on instance of Row interface returned from the actual API rather than the particular implementation RowImpl. If we later change the implementation these kinds of tests get very brittle and hard to refactor whereas tests that test against the interface don't.

purplefox

Approving as overall approach looks sound, on basis that review comments are addressed (or not addressed if my comment does not make sense - let's discuss ;) )

vcrfxia

Thanks for the review @purplefox -- super helpful comments and suggestions! Responded inline.

vcrfxia · 2020-04-29T07:36:35Z

ksqldb-api-client/pom.xml

+
+    <dependencies>
+        <dependency>
+            <groupId>io.confluent.ksql</groupId>


Had a look. The main dependencies right now are:

QueryResponseMetadata, which is used by QueryResponseHandler in order to deserialize the object

BufferedPublisher, which is extended by QueryResultImpl

BaseSubscriber, which is extended by PollableSubscriber

What's your recommendation for removing these dependencies? I see the value in not having the client depend on any of the server modules but I also don't think it makes sense to duplicate these classes. What do you think?

ksqldb-api-client/src/main/java/io/confluent/ksql/api/client/Client.java

vcrfxia · 2020-04-29T07:41:14Z

ksqldb-api-client/src/main/java/io/confluent/ksql/api/client/Client.java

+
+  void close();
+
+  static Client create(ClientOptions clientOptions) {


Good call on the motivation for using the interface method over directly instantiating ClientImpl. I've updated the tests, and also applied the analogous change to ClientOptionsImpl.

I'm not seeing a way to make the constructor for ClientImpl package private, though. Client.java is in a different package from ClientImpl.java so if Client.java is able to instantiate ClientImpl, then ClientImpl must have a public method for instantiation, whether that's a constructor or a factory method. How do people usually work around this?

I'd also add a version of create that takes a Vertx instance.

Done.

vcrfxia · 2020-04-29T07:44:58Z

ksqldb-api-client/src/main/java/io/confluent/ksql/api/client/ClientOptions.java

+
+package io.confluent.ksql.api.client;
+
+public interface ClientOptions {


I'm trying to understand the expected behavior if a user provides HttpClienttOptions: will ClientOptionsImpl update the provided HttpClienttOptions according to the other fields, or will the user-provided HttpClientOptions be used directly?

The latter doesn't seem very user-friendly since then the user would be responsible for duplicating the work of ClientImpl in populating HttpClientOptions based on ClientOptions, but the former also seems confusing since the user would have to know which HttpClientOptions properties will be overridden and which won't.

ksqldb-api-client/src/test/java/io/confluent/ksql/api/client/ClientTest.java

vcrfxia · 2020-04-29T08:20:01Z

ksqldb-api-client/src/test/java/io/confluent/ksql/api/client/impl/QueryResultImplTest.java

+  }
+
+  @Test
+  public void shouldNotSubscribeIfPolling() {


Sure, I'll add equivalent tests to ClientTest as part of revamping / improving test coverage. My vote is to leave these unit tests in place, though, until we see them become brittle. I think it's useful to be able to scan a test file and understand the key pieces of functionality for a class without having to dig through integration tests. Though I guess Java docs on the class/interface itself should serve this purpose in most cases, so maybe that's not a good reason...

vcrfxia · 2020-04-29T08:20:10Z

ksqldb-api-client/src/test/java/io/confluent/ksql/api/client/impl/RowImplTest.java

+import org.junit.Before;
+import org.junit.Test;
+
+public class RowImplTest {


vcrfxia

Thanks again for the reviews (and patient explanations) @purplefox !

I've addressed the majority of comments and am planning to merge this PR once the build passes. Follow-up PRs will contain:

additional test coverage: error cases, functionality currently only tested via unit tests, TLS mutual auth, result schemas with decimal and complex types, QueryResultImpl#isComplete()
remove dependency on ksqlDB server module
introduction of structured types (to wrap Vert.x JsonObject and JsonArray), and Row#getDecimal() methods
expose Vert.x HttpClientOptions, pending discussion in feat(client): Java client with push + pull query support #5200 (comment)
other interface changes resulting from the discussion in KLIP-26

vcrfxia · 2020-05-09T00:42:50Z

ksqldb-api-client/src/main/java/io/confluent/ksql/api/client/Row.java

+   * @param columnName name of column.
+   * @return column value.
+   */
+  Boolean getBoolean(String columnName);


As discussed in KLIP-26, I plan to introduce types that wrap the Vert.x types. Stand by for a follow-up PR with this change.

vcrfxia · 2020-05-09T00:52:10Z

ksqldb-api-client/src/test/java/io/confluent/ksql/api/client/impl/QueryResultImplTest.java

+  }
+
+  @Test
+  public void shouldNotSubscribeIfPolling() {


Update: going to add these additional tests in a follow-up PR.

And as long as you won't be upset if I end up deleting them after spending an hour trying to refactor them if the implementation changes ;)

Fine by me ;)

vcrfxia · 2020-05-09T00:53:17Z

ksqldb-api-client/src/main/java/io/confluent/ksql/api/client/impl/QueryResultImpl.java

+
+  @Override
+  public boolean isComplete() {
+    return false;


Got'cha. I've renamed a couple internal variables in BufferedPublisher to better reflect this. Thanks for the clarification!

vcrfxia · 2020-05-09T01:01:33Z

ksqldb-api-client/src/main/java/io/confluent/ksql/api/client/ClientOptions.java

+
+  ClientOptions setUseClientAuth(boolean useClientAuth);
+
+  ClientOptions setVerifyHost(boolean verifyHost);


These additional TLS options were needed to get the tests working. In a follow-up PR they'll either be removed in favor of exposing Vert.x HttpClientOptions (pending discussion in #5200 (comment)) or I'll refactor all the TLS options into a separate interface in order to clean up this one.

mjsax · 2020-05-10T20:34:23Z

ksqldb-api-client/src/main/java/io/confluent/ksql/api/client/QueryResult.java

+   * @param timeUnit unit for timeout param.
+   * @return the row, if available; else, null.
+   */
+  Row poll(long timeout, TimeUnit timeUnit);


@vcrfxia As mentioned on the KLIP: why do we not use Duration instead?

Thanks for the bump. I've updated the KLIP and will update the code in a future PR (along with a multitude of other feedback from the KLIP).

feat: java client push/pull query support

461c00a

vcrfxia requested a review from a team as a code owner April 28, 2020 07:43

vcrfxia commented Apr 28, 2020

View reviewed changes

purplefox reviewed Apr 28, 2020

View reviewed changes

purplefox approved these changes Apr 28, 2020

View reviewed changes

vcrfxia added 10 commits April 28, 2020 14:42

chore: checkstyle

6a169a0

chore: allow push and pull queries via both streaming and exec

c2c2c55

chore: feedback

ec98963

chore: allow nulls in ClientOptions

5771d71

chore: static json mapper

02fbc87

chore: fix synchronization in PollableSubscriber

d4f7f41

test: clean up negative tests in ClientTest

21c59fd

test: more ClientTest cleanup

40e3582

chore: creator for ClientOptions

13722fe

chore: don't create basic auth header on each request

553ecf2

vcrfxia commented Apr 29, 2020

View reviewed changes

vcrfxia mentioned this pull request Apr 29, 2020

docs: intent for klip-26: Java client interfaces #5232

Merged

2 tasks

vcrfxia added 7 commits May 3, 2020 13:28

fix: basic auth

c58782c

chore: handle record parser exception

1073dbb

fix: synchronization in QueryResultImpl

0f990c9

feat: tls tests

9044767

chore: checkstyle

8ad66e8

fix: implement isComplete() on QueryResult

dff2ffb

chore: limit number of rows that may be returned from executeQuery()

1ac08f6

vcrfxia commented May 9, 2020

View reviewed changes

vcrfxia changed the title ~~feat: Java client with push + pull query support~~ feat(client): Java client with push + pull query support May 9, 2020

vcrfxia added 2 commits May 8, 2020 21:31

Merge branch 'master' into java-client

dbdfcec

chore: findbugs

bb5ac44

vcrfxia merged commit 280ef0c into confluentinc:master May 9, 2020

vcrfxia deleted the java-client branch May 9, 2020 06:40

mjsax reviewed May 10, 2020

View reviewed changes

vcrfxia mentioned this pull request May 11, 2020

test(client): additional test coverage for push+pull query support #5329

Merged

2 tasks


		void close();

		static Client create(ClientOptions clientOptions) {


		package io.confluent.ksql.api.client;

		public interface ClientOptions {


		ClientOptions setUseClientAuth(boolean useClientAuth);

		ClientOptions setVerifyHost(boolean verifyHost);

feat(client): Java client with push + pull query support #5200

feat(client): Java client with push + pull query support #5200

Conversation

vcrfxia commented Apr 28, 2020

Description

Testing done

Reviewer checklist

vcrfxia Apr 28, 2020 • edited Loading

Choose a reason for hiding this comment

purplefox Apr 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

purplefox left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

purplefox Apr 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

purplefox left a comment

Choose a reason for hiding this comment

vcrfxia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vcrfxia left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vcrfxia Apr 28, 2020 •

edited

Loading

purplefox Apr 28, 2020 •

edited

Loading

purplefox Apr 28, 2020 •

edited

Loading

vcrfxia left a comment •

edited

Loading