Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function strftime prints imprecise fractional seconds #1152

Closed
derekmahar opened this issue Dec 16, 2022 · 19 comments
Closed

Function strftime prints imprecise fractional seconds #1152

derekmahar opened this issue Dec 16, 2022 · 19 comments
Assignees

Comments

@derekmahar
Copy link
Contributor

derekmahar commented Dec 16, 2022

In Miller 6.5.0, function strftime prints imprecise fractional seconds:

$ # Milliseconds precision 
$ echo unix_timestamp=1454176342.303 | mlr put '$date_and_time = strftime($unix_timestamp, "%FT%H:%M:%3S");'
unix_timestamp=1454176342.303,date_and_time=2016-01-30T17:52:22.302

Expected result: date_and_time=2016-01-30T17:52:22.303

$ # Microseconds precision
$ echo unix_timestamp=1454176342.303 | mlr put '$date_and_time = strftime($unix_timestamp, "%FT%H:%M:%6S");'
unix_timestamp=1454176342.303,date_and_time=2016-01-30T17:52:22.302999

Expected result: date_and_time=2016-01-30T17:52:22.303000

$ # Nanoseconds precision 
$ echo unix_timestamp=1454176342.303 | mlr put '$date_and_time = strftime($unix_timestamp, "%FT%H:%M:%9S");'
unix_timestamp=1454176342.303,date_and_time=2016-01-30T17:52:22.302999973

Expected result: date_and_time=2016-01-30T17:52:22.303000000

$ mlr --version
mlr 6.5.0

I might have reported this behaviour in a comment on an earlier issue or discussion related to strftime or strptime, but I can't find the comment. Anyway, I think this behaviour deserves its own separate issue.

@derekmahar
Copy link
Contributor Author

Function sec2gmt suffers from the same precision problem:

$ echo unix_timestamp=1454176342.303 | mlr put '$date_and_time = sec2gmt($unix_timestamp,3);'
unix_timestamp=1454176342.303,date_and_time=2016-01-30T17:52:22.302Z
$ echo unix_timestamp=1454176342.303 | mlr put '$date_and_time = sec2gmt($unix_timestamp,6);'
unix_timestamp=1454176342.303,date_and_time=2016-01-30T17:52:22.302999Z
$ echo unix_timestamp=1454176342.303 | mlr put '$date_and_time = sec2gmt($unix_timestamp,9);'
unix_timestamp=1454176342.303,date_and_time=2016-01-30T17:52:22.302999973Z

@derekmahar
Copy link
Contributor Author

derekmahar commented Dec 16, 2022

Here is a verbose workaround:

$ echo unix_timestamp=1454176342.303 | mlr put '
$unix_timestamp =~ "([0-9]+)(.[0-9]+)?";
$date_and_time = strftime(int("\1"), "%FT%T") . "\2" . "Z";
'
unix_timestamp=1454176342.303,date_and_time=2016-01-30T17:52:22.303

@mogando668
Copy link

Even if we're upshifting all micro-second (10^-6, or 0 . 000 001 ) epoch timestamps to signed integers (by way of double-prec floating pt), it can easily cover all all dates in the years 1700-2250, with room to spare at both ::

gdate --date='@-9007199254.740991'
Thu Jul 27 19:16:23 LMT 1684

gdate --date='@9007199254.740991' 
Tue Jun  5 19:47:34 EDT 2255

even if one wants to play it safe and only use signed 52-bits instead of 53, that's still a very wide range of timestamps that could be fully represented and calculated internally as integers, and only dividing it by 10^3 (for millisec) or 10^6 (for microsec) just prior to output.

gdate --date='@-4503599627.370495'
Sun Apr 15 19:10:10 LMT 1827

gdate --date='@4503599627.370495' 
Sat Sep 17 19:53:47 EDT 2112

for all practical purposes, a date-time library covering plus or minus 100 years more than suffices, especially when it's allowing for microsecond timestamps (and those who seek the full ranges could easily call date or gnu-date (for nanosecs)

migrating them to integers internally basically require zero changes to the rest of your logic flow, upshifting them once at read-in, and downshifting them once at write-out.

** few systems offer true and precise nanoseconds - microsec more than suffices for most usage scenarios (unless this is used for something very specialized like HF-trading platforms or tracing subatomic particle motion at LHC or something)

@derekmahar
Copy link
Contributor Author

derekmahar commented Dec 18, 2022

Are you recommending that Miller migrate floating point seconds with nanosecond precision to integer nanoseconds?

@johnkerl johnkerl changed the title Function strftime prints imprecise fractional seconds. Function strftime prints imprecise fractional seconds Mar 1, 2023
@johnkerl
Copy link
Owner

johnkerl commented Mar 6, 2023

@mogando668 @derekmahar the strptime and strftime functions are, unfortunately, tied intimately to floating-point representations.

The best way I see to implement the excellent advice on this issue is to make new strpntime and strfntime where the n indicates that the numeric representation are signed 64-bit integers containing nanoseconds since the epoch. And/or, strputime and strfutime where the u indicates microseconds.

@derekmahar
Copy link
Contributor Author

@mogando668 @derekmahar the strptime and strftime functions are, unfortunately, tied intimately to floating-point representations.

Do you mean that the Go implementations of strptime and strftime represent instants using a floating-point number? Wouldn't this mean that all Go programs that use these functions would also suffer from imprecise fractional seconds?

The best way I see to implement the excellent advice on this issue is to make new strpntime and strfntime where the n indicates that the numeric representation are signed 64-bit integers containing nanoseconds since the epoch. And/or, strputime and strfutime where the u indicates microseconds.

Would there be any disadvantage to implementing strpntime and strfntime only instead of strputime and strfutime? If not, then were you to implement strpntime and strfntime, you could then easily implement strputime by multiplying the microsecond result of strpntime by 1000, and implement strfutime by invoking strfntime with the microsecond argument to strfutime multiplied by 1000.

@johnkerl
Copy link
Owner

johnkerl commented Mar 6, 2023

Do you mean that the Go implementations of strptime and strftime represent instants using a floating-point number? Wouldn't this mean that all Go programs that use these functions would also suffer from imprecise fractional seconds?

No. Go internally uses a format which is quite fine with regard to precision.

The issue is that the Miller functions use floating-point: strftime takes a floating-point number as argument, and strptime returns one. And the Miller DSL is untyped. I can't change them without breaking the API; hence the idea of new functions.

Would there be any disadvantage to implementing strpntime and strfntime only instead of strputime and strfutime?

Agreed, that would be quite nice :)

Signed 64-bit ints with nanoseconds still gives us centuries past (or before) the epoch which would suffice. :)

@johnkerl
Copy link
Owner

@derekmahar @mogando668 can you take a look a head now that #1326 is merged?

@derekmahar
Copy link
Contributor Author

derekmahar commented Jun 26, 2023

@derekmahar @mogando668 can you take a look a head now that #1326 is merged?

After building Miller at commit d72ef826fb5ecc41a4cde0d0fcb2402082b83ca1:

$ # Milliseconds precision
$ echo unix_timestamp=1454176342303000000 | ./mlr put '$date_and_time = strfntime($unix_timestamp, "%FT%H:%M:%3S");'
unix_timestamp=1454176342303000000,date_and_time=2016-01-30T17:52:22.303
$ # Microseconds precision
$ echo unix_timestamp=1454176342303000000 | ./mlr put '$date_and_time = strfntime($unix_timestamp, "%FT%H:%M:%6S");'
unix_timestamp=1454176342303000000,date_and_time=2016-01-30T17:52:22.303000
$ # Nanoseconds precision
$ echo unix_timestamp=1454176342303000000 | ./mlr put '$date_and_time = strfntime($unix_timestamp, "%FT%H:%M:%9S");'
unix_timestamp=1454176342303000000,date_and_time=2016-01-30T17:52:22.303000000

For all three cases, the actual result matches the expected result.

@derekmahar
Copy link
Contributor Author

Repeating the same tests for nsec2gmt at commit d72ef826fb5ecc41a4cde0d0fcb2402082b83ca1:

$ echo unix_timestamp=1454176342303000000 | ./mlr put '$date_and_time = nsec2gmt($unix_timestamp,3);'
unix_timestamp=1454176342303000000,date_and_time=2016-01-30T17:52:22.303Z
$ echo unix_timestamp=1454176342303000000 | ./mlr put '$date_and_time = nsec2gmt($unix_timestamp,6);'
unix_timestamp=1454176342303000000,date_and_time=2016-01-30T17:52:22.303000Z
$ echo unix_timestamp=1454176342303000000 | ./mlr put '$date_and_time = nsec2gmt($unix_timestamp,9);'
unix_timestamp=1454176342303000000,date_and_time=2016-01-30T17:52:22.303000000Z

@zmajeed
Copy link

zmajeed commented Jun 29, 2023

Thanks for adding nanosecond resolution! This preserves nanoseconds in the timestamp

$ mlr put 'begin { print strpntime("2023-06-29T22:03:30.937659412Z", "%FT%H:%M:%SZ") }' </dev/null
1688076210937659412

It would be nice to have %s and %N format specifiers for strfntime like GNU date to easily split seconds and nanoseconds

$ date +"{seconds: %s, nanos: %N}" -d @1688076210.937659412
{seconds: 1688076210, nanos: 937659412}

Then I could say

mlr put 'begin {
  epoch_ns = strpntime("2023-06-29T22:03:30.937659412Z", "%FT%H:%M:%SZ")
  print strfntime(epoch_ns, "{seconds: %s, nanos: %N}")
}' </dev/null

@johnkerl
Copy link
Owner

johnkerl commented Jul 2, 2023

OK @zmajeed this is awesome feedback!

I think we can make this happen

@johnkerl
Copy link
Owner

johnkerl commented Jul 2, 2023

@zmajeed

mlr put 'begin {
  epoch_ns = strpntime("2023-06-29T22:03:30.937659412Z", "%FT%H:%M:%SZ");
  print strfntime(epoch_ns, "{seconds: %S, nanos: %N}");
}' </dev/null
{seconds: 30, nanos: 937659412}

as of #1334

@johnkerl
Copy link
Owner

johnkerl commented Jul 2, 2023

I think this is everything -- @derekmahar @zmajeed @mogando668 please let me know if I missed anything and we can reopen -- thank you!

@johnkerl johnkerl closed this as completed Jul 2, 2023
@johnkerl johnkerl removed the active label Jul 2, 2023
@zmajeed
Copy link

zmajeed commented Jul 2, 2023

Amazing! Didn't expect you to jump on this - I should have given more details

The %s GNU extension (lowercase "s") is epoch seconds - not clock seconds

The %N specifier takes an optional resolution - so %3N is milliseconds fraction of epoch, %6N is microseconds fraction etc.

@johnkerl johnkerl reopened this Jul 3, 2023
@johnkerl
Copy link
Owner

johnkerl commented Jul 3, 2023

Thanks @zmajeed !

@zmajeed
Copy link

zmajeed commented Jul 3, 2023

Some references

GNU date time format specifiers - https://www.gnu.org/software/coreutils/manual/html_node/Time-conversion-specifiers.html
Emacs time format specifiers - https://www.gnu.org/software/emacs/manual/html_node/elisp/Time-Parsing.html - %N was contributed by Paul Eggert who also authored it for GNU date - discussion thread - https://lists.gnu.org/archive/html/emacs-devel/2011-07/msg00007.html
gnulib date parser - https://github.com/coreutils/gnulib/blob/master/lib/parse-datetime.y
gnulib %N handler - https://github.com/coreutils/gnulib/blob/master/lib/nstrftime.c#L1115

@johnkerl
Copy link
Owner

johnkerl commented Aug 20, 2023

@johnkerl
Copy link
Owner

Thanks @derekmahar !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants