You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As result when keyUnfold(client.listObjectsV2(req)) has been evaluated for a second time (it happens when an IO object, produced by list is evaluated for a second time) it was referring to a request with continuation token set. In other words, a buggy mix of lazy evaluation and mutable data structures.
How it manifested itself
During discovery, it was always finding valid list of S3 keys. But during consistency check - the IO was evaulated twice and second result was trimmed - containing only tail of a list paginated by 1000 elements and consistency check would never pass, failing with Folder with atomic-events was not found in [s3://com-acme/shredded/good/run=2020-12-14-06-03-48/]
If consistency_check is skipped it was not manifesting itself.
Why I think it didn't have a bigger impact
A folder without atomic-events is considered illegal and loader doesn’t proceed without it. And luckily, atomic-events files are always at the beginning of the listing. Which means that:
If all atomic events are truncated - it fails with "Folder with atomic-events was not found"
If atomic events truncated only partly, e.g. only 1 out of 100 atomic files has been discovered - it just proceed as usual and all shredded types are discovered properly
When consistency check was disabled - the first listing always was succeeding
The text was updated successfully, but these errors were encountered:
It potentially could turn out a very big one with data loss (partial loads to be precise), but I don’t think it actually manifested itself at all.
Details
In a48f671#diff-72636e888e8a66cefcdb832647319064071b06b0a0b0356d18e0618f50e6be41R88-R108 we replaced eagerly-evaulated S3 listing with lazily evaluated action.
As result when
keyUnfold(client.listObjectsV2(req))
has been evaluated for a second time (it happens when anIO
object, produced bylist
is evaluated for a second time) it was referring to a request with continuation token set. In other words, a buggy mix of lazy evaluation and mutable data structures.How it manifested itself
During discovery, it was always finding valid list of S3 keys. But during consistency check - the
IO
was evaulated twice and second result was trimmed - containing only tail of a list paginated by 1000 elements and consistency check would never pass, failing withFolder with atomic-events was not found in [s3://com-acme/shredded/good/run=2020-12-14-06-03-48/]
If
consistency_check
is skipped it was not manifesting itself.Why I think it didn't have a bigger impact
A folder without atomic-events is considered illegal and loader doesn’t proceed without it. And luckily, atomic-events files are always at the beginning of the listing. Which means that:
The text was updated successfully, but these errors were encountered: