Replies: 1 comment
-
It would be possible, but at a fairly heavy cost. You would need to read all the keys out first, sort the key slice, then iterate and write. The code you are looking for is in |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We would like to checksum avro files (potentially very large) in order to tell whether they have changed or not. We would still produce them, but on the consuming side, we're trying to make it easy for someone to say, with reasonable certainty (given by sha256 or higher) that two files are identical, without having to compare all records.
The first obstacle was the sync block being random. With recent PRs merged, that issue got solved.
I am now noticing that the Avro header itself is mutating between runs, and no longer because of the sync block. The header metadata is a map, and is being serialized with its keys in random order.
I got lost trying to understand how maps are being serialized, so I figured I should ask - would it be possible to serialize maps in a deterministic way (e.g. the keys being sorted)?
Beta Was this translation helpful? Give feedback.
All reactions