Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for Select, Scan and Search queries #85

Merged
merged 3 commits into from
Jan 18, 2020

Conversation

anskarl
Copy link
Contributor

@anskarl anskarl commented Jan 1, 2020

  • Changes interval query and response API in order to support the three new queries
  • Legacy mode for Scan queries is configurable from application.conf
  • Updates documentation and examples for new queries
  • Updates unit tests to examine new queries
  • All Druid Queries have toDebugString utility function, in order to get the corresponding native JSON representation of the query as a string (useful for debugging purposes)

GH-83

  - Changes interval query and response API in order to support the three new queries
  - Legacy mode for Scan queries is configurable from application.conf
  - Updates documentation and examples for new queries
  - Updates unit tests to examine new queries
  - All Druid Queries have toDebugString utility function, in order to get the corresponding native JSON representation of the query as a string (useful for debugging purposes)

ing-bankGH-83
@bjgbeelen
Copy link
Collaborator

Thanks @anskari. I’ll have a look at them this friday!

Copy link
Contributor

@barend barend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this adds a lot of value, and I'm in favour of merging.


For example the following:

```scala
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would include the import ing.wbaa.druid.dql.DSL._ in this snippet.

}
def as[T](implicit decoder: Decoder[T]): T = decoder.decodeJson(this.event) match {
case Left(e) => throw e
case Right(value) => value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These matches could be replaced by .toTry.get, although I suppose it's debatable whether that's truly a readability improvement, as it obfuscates the throw.

(This comment applies to a couple of other places too)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with either as long as it is consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like both, but I will change to .toTry.get since it is shorter.

val requestSeries = query.streamSeriesAs[SelectResult].runWith(Sink.seq)
whenReady(requestSeries) { response =>
response.size shouldBe numberOfResults
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the details of how the Akka streaming stuff works, so open question: is the test response large enough compared to the buffer sizes that all the streaming stuff is truly exercised (i.e. it's more than a single-chunk response)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Regarding the test, the granularity type of the testing query is hourly and the threshold of events for each result is 10. In practise the datasource contains data only for a single day 2015-09-12. Therefore, the select query will result to an array of 24 JSON result structures (one for each hour) and each one will contain 10 events (due to threshold of 10).

Streaming JSON is being performed on the top-level JSON result structures (24 in our test), therefore in situations that the threshold is large we may not going to have any benefit from streaming. Only when the top-level JSON structures are many (e.g., months of data and hourly granularity) we benefit from streaming.

In order to have streaming support also for nested structures, we need something like akka-stream-alpakka-json-streaming. In fact this is something that I've tried to implement at the beginning, but I couldn't make it to work with the nested structure of Select response.

Copy link
Collaborator

@bjgbeelen bjgbeelen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Nice improvements indeed!

One minor thing I noticed is the naming DruidResponseScanImpl e.g. I would probably have chosen to use DruidScanResponse. You do use DruidScanResult(s), so just to make it more consistent.

@anskarl
Copy link
Contributor Author

anskarl commented Jan 17, 2020

Looks good. Nice improvements indeed!

One minor thing I noticed is the naming DruidResponseScanImpl e.g. I would probably have chosen to use DruidScanResponse. You do use DruidScanResult(s), so just to make it more consistent.

I agree, I will change the name to DruidScanResponse.

  - Simplify decoding functions by using `.toTry.get` instead of pattern matching.
  - Rename DruidResponseScanImpl to DruidScanResponse

ing-bankGH-83
@barend
Copy link
Contributor

barend commented Jan 18, 2020

Thank you for updating the PR

@krisgeus krisgeus merged commit a4317b6 into ing-bank:master Jan 18, 2020
@anskarl anskarl mentioned this pull request May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants