Fix db_poller fetching so it does not use the saved last_sent_id on i… #156
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…nitial poll and saves the poll time as last_sent
Pull Request Template
Description
The
db_poller
performs polls at regular intervals by referencing thelast_sent_id
andlast_sent
columns stored in thedeimos_poll_info
table. During a poll thedb_poller
may process multiple batches of items with a default batch size of 1000.After the an initial
poll_query
thedb_poller
will save theupdated_at
andid
of the last record it processed as thelast_sent
andlast_sent_id
respectively. i.e.Updates at time of initial poll
deimos_poll_info
table after initial pollOn the next poll the
last_sent
is used to determine time bounds (time_from:
will be2
andtime_to:
which will be~time of the second poll
) and thelast_sent_id
is used as amin_id
for the nextpoll_query(time_from:, time_to:, column_name:, min_id:)
. If these updates have occurred since the initial poll:Updates at time of second poll
The
poll_query
takes in300
as themin_id
and wouldn't know it should query for updates across all id values, leading to missing updates for ids100
and200
.If we can set the
min_id
to0
when we're starting a new poll interval (i.e. when thebatch_count == 0
) then thepoll_query
can trust themin_id
to be valid and we can find updates across all id values.Now if we do this, since we've stored
2
as thelast_sent
this means that thepoll_query
will look receive atime_from
of2
and atime_to
of~time of second poll
. Thepoll_query
could potentially pull in records from the previous poll withupdated_at=2
, now that we query across all id values. To avoid this we can store thetime_to
as the value oflast_sent
after we've completed all processing during a poll interval. This should ensure that we don't include already processed records in subsequent polls.So in this PR I've made changes to set the
min_id
for thepoll_query
to0
whenbatch_count == 0
and I've also stored thetime_to
value as thelast_sent
once all processing during a poll interval has completed.Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Checklist: