-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Source Amazon S3: solve possible case of files being missed during incremental syncs #12568
🐛 Source Amazon S3: solve possible case of files being missed during incremental syncs #12568
Conversation
/test connector=connectors/source-s3
|
/test connector=connectors/source-s3
|
/test connector=connectors/source-s3
|
/test connector=connectors/source-s3
|
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see a test that shows the new state.history
is what you would expect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with evan that we should have very thorough testing on this functionality -- it's pretty important that we:
- don't miss any files
- drop old files from the STATE object to ensure it doesn't bloat unnecessarily
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Outdated
Show resolved
Hide resolved
/test connector=connectors/source-s3
|
/test connector=connectors/source-s3
Build PassedTest summary info:
|
Codecov Report
@@ Coverage Diff @@
## master #12568 +/- ##
=========================================
Coverage ? 90.36%
=========================================
Files ? 18
Lines ? 747
Branches ? 0
=========================================
Hits ? 675
Misses ? 72
Partials ? 0 Continue to review full report at Codecov.
|
/publish connector=connectors/source-s3
|
…incremental syncs (airbytehq#12568) * Added history to state * Deleted unused import * Rollback abnormal state file * Rollback abnormal state file * Fixed type error issue * Fix state issue * Updated after review * Bumped version
What
#5365 - 🐛 Source Amazon S3: solve possible case of files being missed during incremental syncs
How
Retain information in the state about every file's
_ab_source_file_url
ever synced in the last N days (N = 3 days).History of records saved to set and not in cuckoo filter cause case in history saving on str, so expected size of state and complexity of set class can have us more profit than cuckoo filter with
error_rate
more than 0.Pre-merge Checklist
Updating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described here