feat: organize UDFs by category #5813

blueedgenick · 2020-07-11T01:44:44Z

A little friday afternoon, quality-of-life-improvement: i noticed that we now have quite some UDFs, which is cool, but the list you get when executing show functions is quite long and hard to navigate unless you really know what you are looking for. Hence this PR adds a new categorization of UDFs into STRING|AGGREGATE|ARRAY|... etc.

Also a small message at the end of the listing detailing how to get further information about a function of interest.

Looks like quite a lot of changes but most of it is just inserting the appropriate category into the source file of each individual UDF.

JimGalasyn · 2020-07-13T18:50:51Z

docs/concepts/functions.md

@@ -258,6 +258,7 @@ commands.
 |-------------|----------------------------------------------------------------------|----------|
 | name        | The case-insensitive name of the UDF(s) represented by this class.   | Yes      |
 | description | A string describing generally what the function(s) in this class do. | Yes      |
+| catgory     | For grouping similar functions in the output of SHOW FUNCTIONS.      | No       |


JimGalasyn

LGTM, with one typo fix.

agavra

Overall I think this makes sense to me and would be a nice usability boost, but see the first line comment

agavra · 2020-07-13T21:31:23Z

docs/concepts/functions.md

@@ -258,6 +258,7 @@ commands.
 |-------------|----------------------------------------------------------------------|----------|
 | name        | The case-insensitive name of the UDF(s) represented by this class.   | Yes      |
 | description | A string describing generally what the function(s) in this class do. | Yes      |
+| category    | For grouping similar functions in the output of SHOW FUNCTIONS.      | No       |


there's an approved KLIP (see #5269) that proposes to add group for configuration grouping. Having "category" and "group" both present for different purposes might be confusing. We might want "displayGroup" and "configGroup" maybe? Or do we want to reuse the grouping for both?

hmm, that's interesting - i had forgotten where that particular conversation landed. Seems like composing groups of functions you want to assign certain config values to is more likely to be useful in the arena of user-specific, custom-developed UDFs than it is in the general code-base ? Otherwise you would surely need a way to define the groups themselves in the properties file, not in the source code (and i see that this is mentioned in the klip as a possible future enhancement). Naming aside, I see that as an orthogonal concern to the purpose of this PR. Perhaps if that klip is implemented it would make most sense to explicitly call the new annotation attribute it proposes something like "configGroup" to be abundantly clear about what it implies ?
To be clear, my goal here is to provide a grouping or categorization which basically mirrors the way you always see functions grouped in database documentation.

Yup, I totally understand that the intention here is entirely different - but that should be clear from the naming of the attribute. Perhaps we can make it clear but just changing the name of configGroup and leaving this as category

agavra · 2020-07-13T21:33:50Z

ksqldb-udf/src/main/java/io/confluent/ksql/function/FunctionCategory.java

+
+package io.confluent.ksql.function;
+
+public enum FunctionCategory {


it might be better to have these just be strings so that users can categorize their own into whatever categories they want

i did wonder about that but decided i'd rather be restrictive so that we don't accidentally introduce a plethora of too-similar ones into the codebase over time. Perhaps a good compromise would be to have a "CUSTOM" enum value ? Maybe even set that as the default value so that if someone doesn't set a value for this in their own UDF then it just gets set correctly for them ? WDYT ?

I don't think that should really be a concern if we group all of the values in a constant file just like we have here as an enum - people follow examples that are already there, so they'll tend to look in that file before introducing new ones. In the worst case, we'll have some unnecessary categories - but it would be compatible to change those and clean them up.

My preference is to avoid enums in public APIs because they're difficult to extend and impossible to remove.

gotcha, i'll change that then.
Just for my own edification, why is it harder to extend or remove enum constants in an API than static strings ? i would naively suppose that removing existing things always has a similar backwards-compatibility problem, however they are defined ? And by "extending" here, do you mean adding new values ?

why is it harder to extend or remove enum constants in an API than static strings

A few reasons. One is logistically if external repos depend on a removed enum they can't just "add it in" and keep going (which you can if you depended on a removed string constant). Second is a little bit more philosophical, you shouldn't remove an enum because it changes the ordinals of everything after it - which I've been bit by before (though unlikely it'll fall into play here). I think JSON schema might actually fail compatibility checks if you removed an enum from a schema (not that we use json schema yet...)

And by "extending" here, do you mean adding new values ?

yeah

got it, that makes sense. thanks for taking the time to share!

agavra

lgtm :)

blueedgenick requested a review from a team as a code owner July 11, 2020 01:44

blueedgenick requested a review from JimGalasyn as a code owner July 13, 2020 17:56

JimGalasyn reviewed Jul 13, 2020

View reviewed changes

JimGalasyn approved these changes Jul 13, 2020

View reviewed changes

agavra reviewed Jul 13, 2020

View reviewed changes

blueedgenick added 10 commits July 14, 2020 11:29

initial version

50adb7f

docs

3452336

fix docs typo

8db9dc3

missed a couple of 'final's

5189085

retry integration test fix

34d9037

Almog's requested changes

1efb7a1

checkstyle

436397a

more rules

0efb17b

fix copy/paste error

351a6c1

fix fallout from switching to static strings

065e4dd

blueedgenick force-pushed the udf_categories branch from 266b164 to 065e4dd Compare July 14, 2020 19:57

agavra approved these changes Jul 14, 2020

View reviewed changes

missed one more typo

718e937

blueedgenick merged commit 7f5b843 into confluentinc:master Jul 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: organize UDFs by category #5813

feat: organize UDFs by category #5813

blueedgenick commented Jul 11, 2020

JimGalasyn Jul 13, 2020

JimGalasyn left a comment

agavra left a comment

agavra Jul 13, 2020

blueedgenick Jul 13, 2020 •

edited

Loading

agavra Jul 13, 2020

agavra Jul 13, 2020

blueedgenick Jul 13, 2020

agavra Jul 13, 2020

blueedgenick Jul 14, 2020

agavra Jul 14, 2020 •

edited

Loading

blueedgenick Jul 14, 2020

agavra left a comment

	\| catgory \| For grouping similar functions in the output of SHOW FUNCTIONS. \| No \|
	\| category \| For grouping similar functions in the output of SHOW FUNCTIONS. \| No \|


		package io.confluent.ksql.function;

		public enum FunctionCategory {

feat: organize UDFs by category #5813

feat: organize UDFs by category #5813

Conversation

blueedgenick commented Jul 11, 2020

Choose a reason for hiding this comment

JimGalasyn left a comment

Choose a reason for hiding this comment

agavra left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blueedgenick Jul 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agavra Jul 14, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agavra left a comment

Choose a reason for hiding this comment

blueedgenick Jul 13, 2020 •

edited

Loading

agavra Jul 14, 2020 •

edited

Loading