Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common: use run manifest #81

Closed
chuwy opened this issue Jan 8, 2018 · 4 comments
Closed

Common: use run manifest #81

chuwy opened this issue Jan 8, 2018 · 4 comments
Assignees
Milestone

Comments

@chuwy
Copy link
Contributor

chuwy commented Jan 8, 2018

This looks like a most bullet-proof solution against inconsistent S3. Instead of relying on listing S3, we can collect data in shredder (like we do for Snowflake) and write to external manifest (DynamoDB).

Unlike consistency check this does not add idle time and should be very reliable.

Should be optional to reduce maintainance routine for pipelines that don't suffer from inconsistency.

@chuwy chuwy added this to the Release 29 milestone Jan 8, 2018
@chuwy chuwy self-assigned this Jan 8, 2018
@alexanderdean
Copy link
Member

Let's just go for this one. It will solve the problem and be much more robust than our consistency check...

@chuwy
Copy link
Contributor Author

chuwy commented Jan 8, 2018

@chuwy
Copy link
Contributor Author

chuwy commented Jan 12, 2018

Just found another quick and dirty solution to consistency problem (not going to implement it as it is dangerous, just sharing).

For example, last ETL had run=2018-01-01-12-00-00 id and left few ghost folders. Current run is run=2018-01-01-18-30-00 and fails because of files from previous run. We can simply discard first folder because assumption that older folder is always a ghost is relatively safe. It is interesting that StorageLoader always tried to load only first directory. If it was always last directory - we wouldn't even find this problem probably.

@alexanderdean
Copy link
Member

Makes sense @chuwy, thanks for sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants