-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing the first doc record. #60
base: master
Are you sure you want to change the base?
Conversation
missing the first doc record bug
missing the first doc record
@phutchins The first doc record is not handled, pls review this pr and merge to master to release a new gem :) |
@inetufo can you give a little more detail around what you're changing here? Seems like there's more going on than catching the missing first doc. It looks like you're adding a range instead of a starting point. The current behavior is to check if we have a placeholder. If we don't have one, start at the beginning of the collection and process all entries waiting for more. If we do have a placeholder, we start there and continue processing all entries waiting for more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @inetufo thanks for taking the time to work on this. I've added a few comments and requested a few changes. If you can update to make these changes we can get this merged.
collection = mongodb.collection(mongo_collection_name) | ||
# Need to make this sort by date in object id then get the first of the series | ||
# db.events_20150320.find().limit(1).sort({ts:1}) | ||
if first_entry_id_object == last_id_object | ||
@logger.info("get_cursor_for_collection first entry:#{first_entry_id_object}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might should be debug level logging.
@@ -144,10 +144,15 @@ def get_collection_names(mongodb, collection) | |||
end | |||
|
|||
public | |||
def get_cursor_for_collection(mongodb, mongo_collection_name, last_id_object, batch_size) | |||
def get_cursor_for_collection(mongodb, mongo_collection_name, first_entry_id_object, last_id_object, batch_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably rename these variables as they're a bit confusing.
Maybe...
first_entry_id_object => processing_start_entry_id_object
last_id_object => last_unprocessed_entry_id_object (or maybe first_unprocessed_entry_id_object)
... or something like that.
@@ -112,7 +112,7 @@ def get_placeholder(sqlitedb, since_table, mongodb, mongo_collection_name) | |||
if x[:place].nil? || x[:place] == 0 | |||
first_entry_id = init_placeholder(sqlitedb, since_table, mongodb, mongo_collection_name) | |||
@logger.debug("FIRST ENTRY ID for #{mongo_collection_name} is #{first_entry_id}") | |||
return first_entry_id | |||
return [first_entry_id, first_entry_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should return the placeholder or nil and we can wrap it with a get_processing_start_id() or something that returns us the palceholder or the beginning of the db.
@@ -248,7 +265,8 @@ def run(queue) | |||
last_id_object = Time.at(last_id) | |||
end | |||
end | |||
cursor = get_cursor_for_collection(@mongodb, collection_name, last_id_object, batch_size) | |||
|
|||
cursor = get_cursor_for_collection(@mongodb, collection_name, first_entry_id_object, last_id_object, batch_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather only pass get_cursor_for_collection the starting point and batch_size and use a separate method as mentioned above to work out what that should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, when are you going to merge this branch? I really need it, thank you.
Hi, this is a real bug need to be solved. Please refer to #71 |
I added a placeholder and also it removes the first document of data ! |
missing the first doc record.