Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: organize UDFs by category #5813

Merged
merged 11 commits into from
Jul 14, 2020

Conversation

blueedgenick
Copy link
Contributor

A little friday afternoon, quality-of-life-improvement: i noticed that we now have quite some UDFs, which is cool, but the list you get when executing show functions is quite long and hard to navigate unless you really know what you are looking for. Hence this PR adds a new categorization of UDFs into STRING|AGGREGATE|ARRAY|... etc.

Also a small message at the end of the listing detailing how to get further information about a function of interest.

Looks like quite a lot of changes but most of it is just inserting the appropriate category into the source file of each individual UDF.

@blueedgenick blueedgenick requested a review from a team as a code owner July 11, 2020 01:44
@@ -258,6 +258,7 @@ commands.
|-------------|----------------------------------------------------------------------|----------|
| name | The case-insensitive name of the UDF(s) represented by this class. | Yes |
| description | A string describing generally what the function(s) in this class do. | Yes |
| catgory | For grouping similar functions in the output of SHOW FUNCTIONS. | No |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| catgory | For grouping similar functions in the output of SHOW FUNCTIONS. | No |
| category | For grouping similar functions in the output of SHOW FUNCTIONS. | No |

Copy link
Member

@JimGalasyn JimGalasyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with one typo fix.

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this makes sense to me and would be a nice usability boost, but see the first line comment

@@ -258,6 +258,7 @@ commands.
|-------------|----------------------------------------------------------------------|----------|
| name | The case-insensitive name of the UDF(s) represented by this class. | Yes |
| description | A string describing generally what the function(s) in this class do. | Yes |
| category | For grouping similar functions in the output of SHOW FUNCTIONS. | No |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's an approved KLIP (see #5269) that proposes to add group for configuration grouping. Having "category" and "group" both present for different purposes might be confusing. We might want "displayGroup" and "configGroup" maybe? Or do we want to reuse the grouping for both?

Copy link
Contributor Author

@blueedgenick blueedgenick Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, that's interesting - i had forgotten where that particular conversation landed. Seems like composing groups of functions you want to assign certain config values to is more likely to be useful in the arena of user-specific, custom-developed UDFs than it is in the general code-base ? Otherwise you would surely need a way to define the groups themselves in the properties file, not in the source code (and i see that this is mentioned in the klip as a possible future enhancement). Naming aside, I see that as an orthogonal concern to the purpose of this PR. Perhaps if that klip is implemented it would make most sense to explicitly call the new annotation attribute it proposes something like "configGroup" to be abundantly clear about what it implies ?
To be clear, my goal here is to provide a grouping or categorization which basically mirrors the way you always see functions grouped in database documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I totally understand that the intention here is entirely different - but that should be clear from the naming of the attribute. Perhaps we can make it clear but just changing the name of configGroup and leaving this as category


package io.confluent.ksql.function;

public enum FunctionCategory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be better to have these just be strings so that users can categorize their own into whatever categories they want

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did wonder about that but decided i'd rather be restrictive so that we don't accidentally introduce a plethora of too-similar ones into the codebase over time. Perhaps a good compromise would be to have a "CUSTOM" enum value ? Maybe even set that as the default value so that if someone doesn't set a value for this in their own UDF then it just gets set correctly for them ? WDYT ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that should really be a concern if we group all of the values in a constant file just like we have here as an enum - people follow examples that are already there, so they'll tend to look in that file before introducing new ones. In the worst case, we'll have some unnecessary categories - but it would be compatible to change those and clean them up.

My preference is to avoid enums in public APIs because they're difficult to extend and impossible to remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, i'll change that then.
Just for my own edification, why is it harder to extend or remove enum constants in an API than static strings ? i would naively suppose that removing existing things always has a similar backwards-compatibility problem, however they are defined ? And by "extending" here, do you mean adding new values ?

Copy link
Contributor

@agavra agavra Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it harder to extend or remove enum constants in an API than static strings

A few reasons. One is logistically if external repos depend on a removed enum they can't just "add it in" and keep going (which you can if you depended on a removed string constant). Second is a little bit more philosophical, you shouldn't remove an enum because it changes the ordinals of everything after it - which I've been bit by before (though unlikely it'll fall into play here). I think JSON schema might actually fail compatibility checks if you removed an enum from a schema (not that we use json schema yet...)

And by "extending" here, do you mean adding new values ?

yeah

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, that makes sense. thanks for taking the time to share!

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm :)

@blueedgenick blueedgenick merged commit 7f5b843 into confluentinc:master Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants