Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix index read error #583

Merged
merged 7 commits into from
Sep 29, 2023
Merged

Fix index read error #583

merged 7 commits into from
Sep 29, 2023

Conversation

mpoffald
Copy link
Contributor

Fixes https://github.com/fluree/core/issues/31
Fixes #580

When creating flakes with time-typed values (xsd:dateTime, xsd:date, xsd:time), we coerce the values into java.time/js/Date objects. This index-reading bug was caused by not serializing those values correctly, resulting in unreadable index files. This PR introduces special serialization/deserialization logic for these types .

We were writing out flakes with unreadable tagged literals, eg:

[211106232533902,1038,#object[java.time.OffsetDateTime 0x3d76199b "2023-04-01T00:00Z"],4,-2,true,null]

Which was causing parsing errors:

Exception in thread "async-thread-macro-1" com.fasterxml.jackson.core.JsonParseException: Unexpected character ('#' (code 35)): expected a valid value (JSON String, Number, Array, Object or token 'null', 'true' or 'false')

Custom formatters are used during serialization to ensure that our existing datatype/coerce code can be reused to deserialize these values upon load. This is because the default str/ .toString() behavior will trim off trailing zeros, which produces values that are not valid for the given datatype. This is a simple initial implementation, we can potentially optimize in the future with more clever formatters and/or a separate coercion path for deserialization.

For #580, I switched coercion for xsd:time and xsd:dateTime in clj to use the java parsing functions directly. Our hard-coded multiplication of nanoseconds was causing errors/incorrect values, whereas the built-in parsing fns will do the correct thing with those values. I'm not sure if there was a reason to not use these parsing fns in the first place, if there's a significant downside I wasn't aware of then we should discuss other alternatives.

Upon transaction, flakes with time types (`xsd:time`, `xsd:date`, `xsd:dateTime`) have their values coerced to `java.time`/`js/Date` objects. When writing index files to disk, we need to write out readable strings for these objects, and then re-coerce them into time objects upon load. Previously we were writing files with unreadable tagged literals in them.

Note: This writes out time strings with maximal precision, allowing us to reuse our existing datatype coercion code when loading (time strings with trailing zeros trimmed do not always satisfy our regexes). This makes for a simpler solution, but it does mean potential wasted space in our index files. We can likely optimize this in the future by having more clever formatters and/or having deserialization-specific coercion logic that recognizes truncate
d values.
Fixes #580

The built-in java parsers will do the correct thing for subsecond values. We don't need to be doing any manual string-inspecting apart from determining whether an offset is present.
Previously, this value was being written as an `xsd:dateTime`, but that is not a valid `xsd:dateTime` value: https://books.xmlschemata.org/relaxng/ch19-77049.html

Apart from just being incorrect, this was causing this one flake to be an anomaly among the `xsd:DateTime` flakes, breaking the assumptions needed for serialization/deserialization.
…d:dateTime` flakes

Now that the `commit:time` flake is no longer erroneously of type `xsd:dateTime`, any time-typed flakes we encounter should be standardized to time objects. Therefore, we don't need to check the type, and can just use the formatters directly.
@mpoffald
Copy link
Contributor Author

In the course of working on this, I discovered we have the same issue with commit files. I made a separate issue for it here: #584

Once this PR is accepted/merged, then the fix for that one can build off of it.


(if-let [offset (peek matches)]
(OffsetDateTime/parse s)
(LocalDateTime/parse s))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly a lot simpler! 😃

Copy link
Contributor

@dpetran dpetran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📇 I don't recall why I did the LocalDateTime/of approach instead of parse, but if they handle your test cases I think that's an improvement.

@mpoffald mpoffald merged commit 3b95d19 into main Sep 29, 2023
6 checks passed
@mpoffald mpoffald deleted the fix/index-read-error branch September 29, 2023 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xsd:dateTime, xsd:time coercion is incorrect for fractional seconds
2 participants