Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDB Loader: fix consistency check truncation #314

Closed
chuwy opened this issue Feb 15, 2021 · 1 comment
Closed

RDB Loader: fix consistency check truncation #314

chuwy opened this issue Feb 15, 2021 · 1 comment
Labels
Milestone

Comments

@chuwy
Copy link
Contributor

chuwy commented Feb 15, 2021

It potentially could turn out a very big one with data loss (partial loads to be precise), but I don’t think it actually manifested itself at all.

Details

In a48f671#diff-72636e888e8a66cefcdb832647319064071b06b0a0b0356d18e0618f50e6be41R88-R108 we replaced eagerly-evaulated S3 listing with lazily evaluated action.

As result when keyUnfold(client.listObjectsV2(req)) has been evaluated for a second time (it happens when an IO object, produced by list is evaluated for a second time) it was referring to a request with continuation token set. In other words, a buggy mix of lazy evaluation and mutable data structures.

How it manifested itself

During discovery, it was always finding valid list of S3 keys. But during consistency check - the IO was evaulated twice and second result was trimmed - containing only tail of a list paginated by 1000 elements and consistency check would never pass, failing with Folder with atomic-events was not found in [s3://com-acme/shredded/good/run=2020-12-14-06-03-48/]

If consistency_check is skipped it was not manifesting itself.

Why I think it didn't have a bigger impact

A folder without atomic-events is considered illegal and loader doesn’t proceed without it. And luckily, atomic-events files are always at the beginning of the listing. Which means that:

  1. If all atomic events are truncated - it fails with "Folder with atomic-events was not found"
  2. If atomic events truncated only partly, e.g. only 1 out of 100 atomic files has been discovered - it just proceed as usual and all shredded types are discovered properly
  3. When consistency check was disabled - the first listing always was succeeding
@chuwy chuwy added the bug label Feb 15, 2021
@chuwy chuwy added this to the Version 0.18.2 milestone Feb 15, 2021
@chuwy
Copy link
Contributor Author

chuwy commented Feb 15, 2021

Implemented in #310

@chuwy chuwy closed this as completed Feb 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant