-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement EXPLODE(ARRAY) for single table function in SELECT #3589
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @purplefox
Lot to review here. Lot's of nits, but also some architectural / structural improvements requested to. Though, given you're new to the code base this is a blooming awesome first stab.
Feel free to suggest picking up any of my comments in follow up PRs where you think it makes sense.
Let me know when this is ready for another review.
ksql-common/src/main/java/io/confluent/ksql/function/FunctionRegistry.java
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/function/FunctionRegistry.java
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/function/FunctionRegistry.java
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/function/KsqlTableFunction.java
Outdated
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/function/KsqlTableFunction.java
Outdated
Show resolved
Hide resolved
ksql-functional-tests/src/test/resources/query-validation-tests/simple_agg.json
Outdated
Show resolved
Hide resolved
ksql-streams/src/main/java/io/confluent/ksql/execution/streams/ExecutionStepFactory.java
Outdated
Show resolved
Hide resolved
ksql-udf/src/main/java/io/confluent/ksql/function/udtf/Udtf.java
Outdated
Show resolved
Hide resolved
ksql-udf/src/main/java/io/confluent/ksql/function/udtf/UdtfDescription.java
Outdated
Show resolved
Hide resolved
ksql-udf/src/main/java/io/confluent/ksql/function/udtf/UdtfFactory.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass with mostly surface level comments - there's a lot of unused code that andy pointed out. It'd be much easier to review without that! Overall looks pretty solid (cool to see the functioning QTT tests for this!)
ksql-common/src/main/java/io/confluent/ksql/function/FunctionRegistry.java
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/udtf/array/ExplodeFunctionFactory.java
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/function/KsqlTableFunction.java
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/function/KsqlTableFunction.java
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/function/TableFunctionFactory.java
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/InternalFunctionRegistry.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/InternalFunctionRegistry.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/planner/LogicalPlanner.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/util/TableFunctionExpressionRewriter.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/test/java/io/confluent/ksql/function/InternalFunctionRegistryTest.java
Outdated
Show resolved
Hide resolved
This PR has 108 comments over several days. Would it make sense to have a meeting to quickly resolve the big talking points? I see at least one about the apparently leaky abstraction Essentially, it would be good to see if there is a way to make the review more efficient. |
I think there are four main types of comments in this review.
Imho 1, 2 and 3 detract from the focus of the review and probably make up a large majority of the traffic on this review. Once we subtract them we're left with a small number of totally valid, and interesting points that we can discuss and that would make the review way more targeted and manageable. Ideally we'd only have 4 type comments in reviews but we don't live in an ideal world yet, but I think that's what we should aim towards ;) |
Sorry @apurvam I accidentally edited your comment (clicked on the wrong comment while editing), but I think I put it back to how it was before! Actually I'm kind of surprised that GitHub lets you edit other users comments. |
I think there was general consensus to better tooling to capture more things. However, until that is in place there'll still need to be such things picked up in reviews. There are also likely some things we can't easily pick up with existing tooling, e.g. checking parameters to constructors are valid. I'm a strong believer in the matra don't allow your objects to be created in an invalid state, hence tending to submit nits on param null checks and other invalid parameter issue. I'm not aware of any existing tooling that will pick up on missing null checks, though we could look to write our own checkstyle rule or similar - and I agree there is worth in doing so as we grow. How ever there are other instances of parameter validation that a code analysis tool can't so readily pick up on, e.g. a string parameter that shouldn't have trailing or leading whitespace or similar. Such things will ultimately rely on the originating engineer defensively coding, or the reviews pointing this out.
Just because an existing piece of code does something does not implicitly make it something we want to emulate or a pattern we want to see replicated. We know we have areas in the code that need recactoring. May of these comments are simply requesting that we write new code that matches how we want the code to be and not replicate bad patterns that we know we want to fix. For minor things it sometimes makes sense to fix up the existing code as we go. Otherwise such things often never happen. That said, it's generally a value call. Too much 'tidy up' in a PR that's main job is to introduce new functionally, makes the PR harder to review. For example, I added a note that a new class in this PR was in a
IMHO we should never check todo's into the code base. Either do them, raise Github issues or some how track them on your own 'todo' list. In general, I think we should look to get beyond the WIP before we look to commit. This means those doing the reviewing can see how all the parts of the code/design hang together and make meaningful commons. If we commit WIP code, there is always the risk that it's never completed.
No arguments here ;) In summary, my thoughts are:
|
Thanks for taking the time to write up your thoughts on the PR review process it self, @big-andy-coates and @purplefox . I think this merits a face to face discussion. Some concrete thoughts on each of the points @purplefox raised:
IMO, both of these I think should be approached with moderation. Unless there is something egregious (major deviation from the documented coding style, obvious bugs, introduction of significant tech debt etc.), a rule of thumb I have used in the past is is to have a 1:1 ratio (or something close to it) of these types of comments relative to the comments raising questions on the architectural/design approach taken (category 4 in Tim's list). I think in general it should be acceptable to file followup issues for refactorings rather than bundling them into a PR adding functionality. I would also say that most of the contributors on this code base have the attitude of leaving things better than they found it. If most PRs (including planned followups) follow this ethos, then I think we should favor review efficiency at the cost of perfecting each PR. Numerous iterations on single PRs take a lot of energy and I think we have an opportunity to improve our efficiency with this process.
I am not quite sure what this means. Is it review of code that isn't meant to be checked in? Or is it review of code which has planned follow ups which is taking up time?
Cool! It would be good to get a sense of what percentage of the discussion in this case was of this class. That is the true metric of our efficiency here IMO. |
Meta comments of the PR review aside, I think it would be good to get a sense of how far this is from merging. It seems to me that the biggest point is whether to model UDTFs on UDFs or on UDAFs (current implementation). I think this is probably worth resolving in a face to face conversation. |
I think that's true but we should be transparent here. Whether not they are checked automatically I think we should state, clearly and unambiguously, (in a wiki page for example), exactly what the style rules are. These should be easy to catalogue - I already know 5 or 6 that I have inferred from reviews already (I will make a stab on this today). Being transparent means the contributors can make an effort to comply the rules before they submit PR so they don't receive any surprises during review which is a waste of everyone's time. This is especially important for external contributors where we risk alienating them and not getting a second PR if we're too critical (style wise) of their work after they've spent time on it. So, sure, some will slip through the net, and will be picked up in reviews, but making expectations clear up front should minimise these. Making clear what the style rules are also clarifies what are rules and what are just opinions. There's an important difference here - every contributor should be expected to conform to the rules whether they like them or not. Rules are enforceable in review, opinions are not, and the opinion of everyone on the team should be equally valued. |
Other than being transparent about what are style rules actually are, I think there is a whole other debate that should be had, at some point (no need to do it now) about whether these rules are actually adding value. Picking up briefly on two classic ones (and I am happy to go into much more detail about these and my real world experiences around them): Null parameter checking and final parameters.
My 2c - always question "best practice" and don't take it as read. I've been developing in Java since 1996 (and before that I was doing C++, amongst other things), and I've seen a lot of fads come and go. Be wary of "rules" that are applied without judgement. [Yes this is totally off topic, and I'm sorry ;) ] |
Apologies, I was unclear here! The overall epic task that tracks table functions is made up of several sub tasks. And each one of those subtasks will probably have its own PR. By WIP I don't mean this PR is incomplete, but the overall epic is incomplete. So all this work will be WIP until the entire epic is complete. So, when I have added TODOs to the code, they are referring to work that will be completed in later subtasks. BTW, I quite agree with Andy's point that TODOs should not exist in completed code. But in this case they are not a substitute to using issues, as all the work is already tracked by issues. They are merely giving a little bit of a hint to the reader of the code that the overall work is incomplete. As a concrete example, the code here only supports a single table function, but KLIP-9 says that multiple table functions will be supported. A reader who has read the KLIP and is now reading the code might be confused that the code only supports a single table function. A little TODO saying "TODO support multiple table functions in later PRs" goes a long way to helping the reader understand without embarking on a 15 min goose chase to figure out what is going on. Of course (and this goes for any style rules that we agree on). I am happy to follow any style rules the project has, whether or not I agree with them, as the team comes first. But this need to be made transparent. |
…ewriter to analyzer package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had time to digest my previous comment (#3589 (comment)), and I think what was bothering me there has matured into some concrete ideas. See inline comments below.
ksql-common/src/main/java/io/confluent/ksql/name/ColumnName.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/analyzer/TableFunctionExpressionRewriter.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/planner/plan/FlatMapNode.java
Outdated
Show resolved
Hide resolved
…able functions are now gathered in Analyzer and select expression rewriting occurs in FlatMapNode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've learned a ton from reviewing this PR and our discussions - thanks @purplefox for the quick iterations and the tons of changes you've made. I think this PR is nearly ready to go! A few of my comments inline are important (around the column naming/bad imports), most are follow-ups that can be done later.
ksql-engine/src/main/java/io/confluent/ksql/analyzer/QueryAnalyzer.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/BaseAggregateFunction.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/BaseTableFunction.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/planner/plan/FlatMapNode.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/planner/plan/FlatMapNode.java
Show resolved
Hide resolved
ksql-functional-tests/src/test/resources/query-validation-tests/table-functions.json
Show resolved
Hide resolved
…lumns to avoid confusion with select column aliases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super stoked for this to go through. let's wait on @big-andy-coates' stamp as well and maybe it even sneaks into 5.4, how awesome would that be?
ksql-engine/src/main/java/io/confluent/ksql/function/InternalFunctionRegistry.java
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/analyzer/Analysis.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/InternalFunctionRegistry.java
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/udtf/array/ExplodeTableFunction.java
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/udtf/array/ExplodeFunctionFactory.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for comment...
ksql-engine/src/main/java/io/confluent/ksql/planner/PlanSourceExtractorVisitor.java
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/planner/plan/FlatMapNode.java
Show resolved
Hide resolved
@@ -419,4 +354,125 @@ public void shouldNotAllowModificationViaListFunctions() { | |||
private void givenUdfFactoryRegistered() { | |||
functionRegistry.ensureFunctionFactory(UdfLoaderUtil.createTestUdfFactory(func)); | |||
} | |||
|
|||
private static AggregateFunctionFactory createAggregateFunctionFactory() { | |||
return new AggregateFunctionFactory("my_aggregate") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agavra suggested using a mock - is there any reason that won't work, it would certainly be a whole lot less code!
Description
Tracked by: #3505
This is the first task in the implementation of table functions for KSQL.
It uses the approach outlined in KLIP-9 (waiting for approval but we seem to have consensus)
Testing done
Reviewer checklist