Fix index read error #583

mpoffald · 2023-09-28T20:57:59Z

Fixes https://github.com/fluree/core/issues/31
Fixes #580

When creating flakes with time-typed values (xsd:dateTime, xsd:date, xsd:time), we coerce the values into java.time/js/Date objects. This index-reading bug was caused by not serializing those values correctly, resulting in unreadable index files. This PR introduces special serialization/deserialization logic for these types .

We were writing out flakes with unreadable tagged literals, eg:

[211106232533902,1038,#object[java.time.OffsetDateTime 0x3d76199b "2023-04-01T00:00Z"],4,-2,true,null]

Which was causing parsing errors:

Exception in thread "async-thread-macro-1" com.fasterxml.jackson.core.JsonParseException: Unexpected character ('#' (code 35)): expected a valid value (JSON String, Number, Array, Object or token 'null', 'true' or 'false')

Custom formatters are used during serialization to ensure that our existing datatype/coerce code can be reused to deserialize these values upon load. This is because the default str/ .toString() behavior will trim off trailing zeros, which produces values that are not valid for the given datatype. This is a simple initial implementation, we can potentially optimize in the future with more clever formatters and/or a separate coercion path for deserialization.

For #580, I switched coercion for xsd:time and xsd:dateTime in clj to use the java parsing functions directly. Our hard-coded multiplication of nanoseconds was causing errors/incorrect values, whereas the built-in parsing fns will do the correct thing with those values. I'm not sure if there was a reason to not use these parsing fns in the first place, if there's a significant downside I wasn't aware of then we should discuss other alternatives.

Upon transaction, flakes with time types (`xsd:time`, `xsd:date`, `xsd:dateTime`) have their values coerced to `java.time`/`js/Date` objects. When writing index files to disk, we need to write out readable strings for these objects, and then re-coerce them into time objects upon load. Previously we were writing files with unreadable tagged literals in them. Note: This writes out time strings with maximal precision, allowing us to reuse our existing datatype coercion code when loading (time strings with trailing zeros trimmed do not always satisfy our regexes). This makes for a simpler solution, but it does mean potential wasted space in our index files. We can likely optimize this in the future by having more clever formatters and/or having deserialization-specific coercion logic that recognizes truncate d values.

Fixes #580 The built-in java parsers will do the correct thing for subsecond values. We don't need to be doing any manual string-inspecting apart from determining whether an offset is present.

Previously, this value was being written as an `xsd:dateTime`, but that is not a valid `xsd:dateTime` value: https://books.xmlschemata.org/relaxng/ch19-77049.html Apart from just being incorrect, this was causing this one flake to be an anomaly among the `xsd:DateTime` flakes, breaking the assumptions needed for serialization/deserialization.

…d:dateTime` flakes Now that the `commit:time` flake is no longer erroneously of type `xsd:dateTime`, any time-typed flakes we encounter should be standardized to time objects. Therefore, we don't need to check the type, and can just use the formatters directly.

mpoffald · 2023-09-28T21:04:14Z

In the course of working on this, I discovered we have the same issue with commit files. I made a separate issue for it here: #584

Once this PR is accepted/merged, then the fix for that one can build off of it.

dpetran · 2023-09-29T14:23:53Z

src/fluree/db/datatype.cljc

-
+       (if-let [offset         (peek matches)]
+         (OffsetDateTime/parse s)
+         (LocalDateTime/parse s))


This is certainly a lot simpler! 😃

dpetran

📇 I don't recall why I did the LocalDateTime/of approach instead of parse, but if they handle your test cases I think that's an improvement.

mpoffald added 7 commits September 28, 2023 15:18

Fix coercion for xsd:dateTime, xsd:time in clj

bc86894

Fixes #580 The built-in java parsers will do the correct thing for subsecond values. We don't need to be doing any manual string-inspecting apart from determining whether an offset is present.

Add support for serialization of xsd:date, xsd:time types

550372b

deserialize js date objects, rename helper fn

ba9a452

add missing cljs require

789e437

dpetran reviewed Sep 29, 2023

View reviewed changes

dpetran approved these changes Sep 29, 2023

View reviewed changes

mpoffald merged commit 3b95d19 into main Sep 29, 2023
6 checks passed

mpoffald deleted the fix/index-read-error branch September 29, 2023 16:09

mpoffald mentioned this pull request Oct 2, 2023

Serialize time objects correctly for commit files #585

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix index read error #583

Fix index read error #583

mpoffald commented Sep 28, 2023

mpoffald commented Sep 28, 2023

dpetran Sep 29, 2023

dpetran left a comment

Fix index read error #583

Fix index read error #583

Conversation

mpoffald commented Sep 28, 2023

mpoffald commented Sep 28, 2023

dpetran Sep 29, 2023

Choose a reason for hiding this comment

dpetran left a comment

Choose a reason for hiding this comment