-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time: enforce strict parsing for RFC3339 for Time.Unmarshal #54580
Comments
I like the goal of having a strict parser for this format, but it feels surprising to me for I would find it more intuitive to have a separate function specifically for parsing RFC 3339 per its specification, without any implication that it's "just another format string". Perhaps that could be accompanied by a deprecation of the constant This would also avoid changing behavior for any existing users of the current API, allowing existing callers to adopt the new function when they are ready. |
The
I'm sympathetic to not changing the As for whether we are allowed to make this change, the Go 1 compatibility document says:
Since functionality that claims to be RFC 3339 do not operate according to RFC 3339, this is an issue of "legality" and the document gives the package the "right to address such issues". Also, this could be an interesting use of the GODEBUG flags discussed in #55090. You could imagine having a |
I don't typically like to get into the business of rules-lawyering based on exactly what the docstrings say, but this seems like a fiddly enough case that using the existing documentation as guidance might be relevant. The time.Parse function is documented as parsing a timestamp in a general way based on a format string with a particular syntax. The specific thing that would "feel surprising" to me is for that to retroactively gain an exception like "...unless the format string is exactly what's in On the other hand, the The Based purely on the contract described in the docs then, it seems reasonable to me that there be a new Any existing callers using I would agree that the existence of the While I do appreciate that it would be possible to use
|
It seems we're in agreement that the
That's true of all the GODEBUG flags. It is a global change of behavior that may be inconsistent in what a particular library wants. The use of GODEBUG will not fix all cases of changed behavior in the standard library but it provides users a knob that is better than nothing. |
Indeed, I think it's defensible that the current implementation of It is indeed true that all GODEBUG flags are global in nature, but that's a more appropriate solution for some situations than others. For example, using GODEBUG to choose between resolver implementations seems reasonable because looking up hostnames is typically (though certainly not always) a global concern that the entire program should agree on. I hesitate to consider it an appropriate solution for all possible changes to the standard library, since there are various situations where the decision is inherently more localized. My sense is that GODEBUG has traditionally been a last resort for situations where the new behavior is "obviously better" but where some small number of programs may need to retain the old behavior. I guess I'm not convinced that a special case in |
Change https://go.dev/cl/444277 mentions this issue: |
I sent out https://go.dev/cl/444277 which only special-cases strict RFC 3339 for the There is also existing precedence for strict checking since the In my opinion, modifying just |
We add strict checking to marshal and unmarshal methods, rather than Parse to maintain compatibility in Parse behavior. Also, the Time.Format method has no ability to report errors. The Time.Marshal{Text,JSON} and Time.Unmarshal{Time,JSON} methods are already documented as complying with RFC 3339, but have edge cases on both marshal and unmarshal where it is incorrect. The Marshal methods already have at least one check to comply with RFC 3339, so it seems sensible to expand this to cover all known violations of the specification. This commit fixes all known edge cases for full compliance. Two optimizations are folded into this change: 1. parseRFC3339 is made generic so that it can operate directly on a []byte as well as string. This avoids allocating or redundant copying when converting from string to []byte. 2. When marshaling, we verify for correctness based on the serialized output, rather than calling attribute methods on the Time type. For example, it is faster to check that the 5th byte is '-' rather than check that Time.Year is within [0,9999], since Year is a relatively expensive operation. Performance: name old time/op new time/op delta MarshalJSON 109ns ± 2% 99ns ± 1% -9.43% (p=0.000 n=10+10) UnmarshalText 158ns ± 4% 143ns ± 1% -9.17% (p=0.000 n=9+9) Updates #54580 Updates #54568 Updates #54571 Change-Id: I1222e45a7625d1ffd0629be5738670a84188d301 Reviewed-on: https://go-review.googlesource.com/c/go/+/444277 Reviewed-by: David Chase <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Joseph Tsai <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
I'm withdrawing this proposal. I think fixing the marshal/unmarshal methods is sufficient. |
We add strict checking to marshal and unmarshal methods, rather than Parse to maintain compatibility in Parse behavior. Also, the Time.Format method has no ability to report errors. The Time.Marshal{Text,JSON} and Time.Unmarshal{Time,JSON} methods are already documented as complying with RFC 3339, but have edge cases on both marshal and unmarshal where it is incorrect. The Marshal methods already have at least one check to comply with RFC 3339, so it seems sensible to expand this to cover all known violations of the specification. This commit fixes all known edge cases for full compliance. Two optimizations are folded into this change: 1. parseRFC3339 is made generic so that it can operate directly on a []byte as well as string. This avoids allocating or redundant copying when converting from string to []byte. 2. When marshaling, we verify for correctness based on the serialized output, rather than calling attribute methods on the Time type. For example, it is faster to check that the 5th byte is '-' rather than check that Time.Year is within [0,9999], since Year is a relatively expensive operation. Performance: name old time/op new time/op delta MarshalJSON 109ns ± 2% 99ns ± 1% -9.43% (p=0.000 n=10+10) UnmarshalText 158ns ± 4% 143ns ± 1% -9.17% (p=0.000 n=9+9) Updates golang#54580 Updates golang#54568 Updates golang#54571 Change-Id: I1222e45a7625d1ffd0629be5738670a84188d301 Reviewed-on: https://go-review.googlesource.com/c/go/+/444277 Reviewed-by: David Chase <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Joseph Tsai <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Just got here via #57897 I'm not sure why wire-serialization changes wouldn't need compatibility consideration. We want people to be able to update go versions with very high confidence, and "ensure nothing calling your API written in go is hitting these RFC3339 edge cases" is a pretty big obstacle to doing that. |
To be clear: @dsnet didn't say that it doesn't need compatibility consideration. He said that it doesn't require a proposal. As we do with other bug-fixes that impact compatibility, I think it would be a good idea to add a |
Oh, I didn't realize that was used in a formal sense. I was just confused why a change to time.Time#MarshalJSON/UnmarshalJSON would require a different level of consideration than a change to time.Parse.
that could be good, especially if paired with:
|
Given that this is a "visible behavior change in existing functionality" (quoting https://go.dev/s/proposal), I think this change should have gone through the proposal process to allow a discussion about the pros and cons of making a breaking change like this among anyone interested. That's the entire role of the proposal process. While we could patch it with a GODEBUG at this late stage, we could also revert for Go 1.20 and do the real proposal process for Go 1.21. Thoughts? |
If there's not a compelling reason this must be in 1.20, reverting for 1.20 and working through a proposal for how to improve RFC compliance in 1.21 (ideally for Time#MarshalJSON/UnmarshalJSON and time.Parse) makes sense to me. |
Strictly speaking, under that definition, every bug fix is a an observable change and would therefore need to go through the proposal process. This issue originally proposed changing the behavior of The purpose of specifications like RFC 3339 is so that different implementations (whether in Go, Rust, Python, or otherwise) can agree on what is a valid timestamp. Other protocols are built upon this and rely on the syntactic agreement of what is a valid timestamp. When one implementation accepts invalid input, it sets a bad precedence for other implementations to also accept invalid input, making specifications functionally unreliable (as is the case with HTTP). There ends up being the "what's specified" category and the "what's specified, but what's most empirically respected, but not really documented anywhere" category. I stand by my original position that this was an okay change without going through the proposal process, but also with the pragmatic understanding that this change could break users. Barring any data of what would break, it still seems like the right move to have submitted the change and for the Go beta and RC to produce any reports of problems. Once those reports come in, we can evaluate again as we are doing right now. I feel strongly we should eventually fix this, but I'm okay with it being in Go 1.20 or in a later released. I agree that we should have a GODEBUG for this (the mechanism was not quite standardized at the time). To be clear about the behavior change:
|
Re-opening and marking as a "release-blocker". |
The RFC3339 constant is documented to be |
I would prefer either... A) Add this as a non-default option to While I can accept it, if a majority decides so, having to set I see there is a value in strict checking, but in my case that is by far outweighed by the potential problems this can bring. See #57897 |
That's true; we have to predict what will and won't be observed and significant. We do spend a lot of time on wire formats in proposal review, and not just because they often involve new methods. In retrospect, stricter RFC3339 parsing is a similar concern and would have been good to treat as a proposal, much as stricter IP address parsing was (#30999). What I take from this is that parsing and printing external wire formats is important for us to remember to flag in the future. |
@klauspost, over on the other issue you wrote:
That's true but we do have three release candidates that have all had this change and only got our first report today. So that's some evidence it's closer to the 0 end of the scale. It would help to understand why this happens, which would let us estimate how likely it is. Where did the bad RFC3339 time stamps you found come from? Not Go, it seems. Same question to @liggitt. |
We do? The fact that Presently, we have evidence that it breaks things now, but I had no evidence of that when this change was made.
I'm opposed to any options in downstream packages as this is not sustainable. If
RFC 3339 is specified. A universal specification takes precedence over a particular implementation, especially if that implementation is non-compliant. |
The documentation of time does not reference the RFC3339 specification. It simply says that the RFC3339 constant is |
@rsc It is in a unit test written 5 years ago. I don't know where the author picked it up. I see the same date in AWS S3 documentation, but for XML data. Edit: Here it is in an (unrelated) aws XML response: https://github.com/aws/aws-sdk-go/blob/main/service/s3/statusok_error_test.go#L30 - not sure if that is the source. My worry is that there are clients out there that will pass these dates. There are hundreds of applications that connect through the S3 API. Similar, our S3 server provides serverside JSON queries, which can break. Our server is installed on-prem, so I cannot run any checks to see how many invalid timestamps exist. From our users perspective it will be a bug and we are pretty much forced to keep supporting it, even if it is technically invalid. I would love some hard numbers, but we don't control or have access to the data of our users. |
@aarzilli, the documentation of
@klauspost, you seem to be describing some magic timestamp that's being passed around in unit tests, what is the exact value? |
github code search shows many more references : 2009-11-23T0:00:00Z |
Discussed at proposal review. Given that we do not have #56986 landed in Go 1.20, creating a GODEBUG will not be sufficient to insulate users from getting the problem automatically when using a new toolchain (without go.mod edits). This is an unintended breakage that affects at least one known, widely used package (aws), so we should roll it back for the release. That will probably mean roll back on tip, cherry-pick to 1.20 branch, roll forward on tip with GODEBUG and the right defaults for older releases (which will require waiting for CL 453605). |
Change https://go.dev/cl/462286 mentions this issue: |
CL 444277 fixed Time.UnmarshalText and Time.UnmarshalJSON to properly unmarshal timestamps according to RFC 3339 instead of according to Go's bespoke time syntax that is a superset of RFC 3339. However, this change seems to have broken an AWS S3 unit test that relies on parsing timestamps with single digit hours. It is unclear whether S3 emits these timestamps in production or whether this is simply a testing artifact that has been cargo culted across many code bases. Either way, disable strict parsing for now and re-enable later with better GODEBUG support. Updates #54580 Change-Id: Icced2c7f9a6b2fc06bbd9c7e90f90edce24c2306 Reviewed-on: https://go-review.googlesource.com/c/go/+/462286 Reviewed-by: Bryan Mills <[email protected]> Run-TryBot: Joseph Tsai <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Auto-Submit: Russ Cox <[email protected]> Reviewed-by: Russ Cox <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
To respond to some of the discussion above about the propriety of making the change at all, we simply don't have accurate data one way or the other. What we have is anecdotes: inside Google, nothing was affected; in open source, we know at least some tests in the AWS API implementations fail. We don't know what will be affected in practice, we don't know which systems generate these non-conforming time stamps, and so on. There are good security and parser alignment reasons to reject non-standard RFC3339 times, and there are good compatibility reasons not to. Part of the motivation for #56986 (now accepted) is to give us a more graceful path forward in borderline cases like this. I am grepping the latest versions of all modules on the proxy for almost-RFC3339 times and will report back. |
Filed #57912 for proposing a roll-forward. Once the rollback is committed on the release branch I will close this issue. |
Change https://go.dev/cl/462675 mentions this issue: |
CL 444277 fixed Time.UnmarshalText and Time.UnmarshalJSON to properly unmarshal timestamps according to RFC 3339 instead of according to Go's bespoke time syntax that is a superset of RFC 3339. However, this change seems to have broken an AWS S3 unit test that relies on parsing timestamps with single digit hours. It is unclear whether S3 emits these timestamps in production or whether this is simply a testing artifact that has been cargo culted across many code bases. Either way, disable strict parsing for now and re-enable later with better GODEBUG support. Updates #54580 Change-Id: Icced2c7f9a6b2fc06bbd9c7e90f90edce24c2306 Reviewed-on: https://go-review.googlesource.com/c/go/+/462286 Reviewed-by: Bryan Mills <[email protected]> Run-TryBot: Joseph Tsai <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Auto-Submit: Russ Cox <[email protected]> Reviewed-by: Russ Cox <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Reviewed-on: https://go-review.googlesource.com/c/go/+/462675 Reviewed-by: Joseph Tsai <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> Run-TryBot: Russ Cox <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
Rolled back on tip and release branch. Roll forward is #57912. |
Hi, I am trying to figure out why the change in |
@mangalaman93, the revert was merged for To unblock your upgrade, you should be able to bypass the error using a custom type or custom |
@mangalaman93 If you're using |
We are using format |
When calling:
one would expect that this strictly parses
s
according to RFC 3339 as the constant suggests.This parses all valid RFC 3339 timestamps (except those with leap-seconds),
however it fails to reject certain inputs that are not valid RFC 3339.
Parse
differs from the RFC 3339 syntax in the following ways:Parse
permits the hour fields to only have one digit, while RFC 3339 requires that it always be two digits.0000-01-01T0:00:00Z
is considered valid, when it should not be.requesting that "15" in the format be always parsed as a two-digit field.
However, it is unlikely that this behavior will be changed due to it being a breaking change.
Parse
permits a comma separator between seconds and sub-seconds, while RFC 3339 only permits a period.0000-01-01T00:00:00,000Z
is considered valid, when it should not be.which modified the behavior of "." in the format to parse either "." or ",".
While this expands the set of ISO 8601 timestamps we can handle,
it further breaks compliance with RFC 3339.
Parse
permits time zone offsets with any range of values, while RFC 3339 limits hours to be less than 24 and minutes to be less than 60.0000-01-01T00:00:00+24:60
is considered valid, when it should not be.In all situations, the time format template provides no way to strictly parse RFC 3339.
Since we can't change the text template meaning of "15" and ".",
nor can we change the constant value of
RFC3339
orRFC3339Nano
,I propose that we special-case
Parse
to use strict parsing when handling a format templateidentical to
RFC3339
andRFC3339Nano
.Justification for this special-casing:
Being loose in what Go considers "valid" RFC 3339 is going against the goal of RFC 3339.
\cc @robpike
The text was updated successfully, but these errors were encountered: