Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add PARSE_TIME and FORMAT_TIME functions #7722

Merged
merged 4 commits into from
Jun 30, 2021

Conversation

jzaralim
Copy link
Contributor

Description

Enables TIME data for UDFs and adds thePARSE_TIME and FORMAT_TIME functions + docs.

Testing done

QTT + unit tests

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@jzaralim jzaralim requested a review from spena June 23, 2021 23:25
@jzaralim jzaralim requested review from JimGalasyn and a team as code owners June 23, 2021 23:25
Copy link
Member

@JimGalasyn JimGalasyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This will need to be cherry-picked to the 0.20.x-ksqldb branch.

docs/developer-guide/ksqldb-reference/scalar-functions.md Outdated Show resolved Hide resolved
Comment on lines +41 to +44
private final LoadingCache<String, DateTimeFormatter> formatters =
CacheBuilder.newBuilder()
.maximumSize(1000)
.build(CacheLoader.from(DateTimeFormatter::ofPattern));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been wondering about this cache for a long time and haven't asked. But why do we need it in the time/date/timestamp functions? If a query calls a time UDF with a specific format, then the query will only use 1 format pattern for all rows, won't it? Or if a query calls UDF more than once (one per column) with different formats, doesn't each column have its own instance of FormatTime which will end up with one single format pattern for all rows?

I haven't checked the above reasoning, but is that the right assumption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because this function gets called every time there's a new record, so having a cache prevents it from having to recreate the formatter each time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is instantiated once per record, then it probably makes sense. But that magic number of 1000 seems too big. We should dig more into this after 0.20. See if we can get rid of that cache or make it hold the exact # of formatters of the row.

return null;
}
try {
final DateTimeFormatter formatter = formatters.get(formatPattern);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If formatPattern has characters, such as days, months, etc., would they be added to the resulted string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, an exception gets thrown - I'll add a test for that.

}
try {
final DateTimeFormatter formatter = formatters.get(formatPattern);
return LocalTime.ofNanoOfDay(time.getTime() * 1000000).format(formatter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return LocalTime.ofNanoOfDay(time.getTime() * 1000000).format(formatter);
return LocalTime.ofNanoOfDay(time.getTime() * 1_000_000).format(formatter);

For easy reading. Perhaps a declaring a constant for this is better? Is this a nano per second value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used TimeUnit conversion functions instead.

Comment on lines 55 to 56
final DateTimeFormatter formatter = formatters.get(formatPattern);
return new Time(LocalTime.parse(formattedTime, formatter).toNanoOfDay() / 1000000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same two questions from FormatTime.

  • Do we want to allow date characters in the format? I don't think we shoud.
  • Can we use a constant variable for the nano per second value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, if we parse something like parse_time('2021 05:45', 'yyyy HH:mm), then it will parse everything but only return the time component (so in this case, it returns 05:45). It's weird that Local time.parse doesn't throw anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check to reject formats with non-time elements

Copy link
Member

@spena spena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +41 to +44
private final LoadingCache<String, DateTimeFormatter> formatters =
CacheBuilder.newBuilder()
.maximumSize(1000)
.build(CacheLoader.from(DateTimeFormatter::ofPattern));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is instantiated once per record, then it probably makes sense. But that magic number of 1000 seems too big. We should dig more into this after 0.20. See if we can get rid of that cache or make it hold the exact # of formatters of the row.

@jzaralim jzaralim merged commit 9a381a8 into confluentinc:master Jun 30, 2021
@jzaralim jzaralim deleted the format-parse-time branch June 30, 2021 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants