Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Parallelize read calls by table and batch #4619

Merged
merged 4 commits into from
Oct 12, 2024

Conversation

robhowley
Copy link
Contributor

@robhowley robhowley commented Oct 11, 2024

What this PR does / why we need it:

Improve the use of async batch calls to dynamo db for get_online_features_async

  • tested on a feature service w 3 feature views and 11 features
  • compared to the sync method, the async method was
    • 50% faster w 5 entities retrieved
    • 75% faster w 80 entities retrieved

Which issue(s) this PR fixes:

Misc

@robhowley robhowley marked this pull request as draft October 11, 2024 22:12
@robhowley robhowley changed the title Chore: put all dynamo calls in an asyncio.gather perf: Put all dynamo calls in an asyncio.gather Oct 11, 2024
@robhowley robhowley changed the title perf: Put all dynamo calls in an asyncio.gather perf: Parallelize read calls by table and batch Oct 12, 2024
Signed-off-by: Rob Howley <[email protected]>
Signed-off-by: Rob Howley <[email protected]>
Comment on lines +312 to +322
batches = []
entity_id_batches = []
while True:
batch = list(itertools.islice(entity_ids_iter, batch_size))
if not batch:
break
entity_id_batch = self._to_client_batch_get_payload(
online_config, table_name, batch
)
batches.append(batch)
entity_id_batches.append(entity_id_batch)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct the batches of ids/entity_ids that we'll be looking up

Comment on lines +325 to +332
response_batches = await asyncio.gather(
*[
client.batch_get_item(
RequestItems=entity_id_batch,
)
for entity_id_batch in entity_id_batches
]
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make those batch requests in parallel.

note: gather maintains order

Comment on lines +335 to +343
for batch, response in zip(batches, response_batches):
result_batch = self._process_batch_get_response(
table_name,
response,
entity_ids,
batch,
to_tbl_response=to_tbl_resp,
)
result_batches.append(result_batch)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format the responses to the final format. we iterate through the list three times in stead of one, but make up for it in asyncing the batches

Comment on lines +263 to +272
all_responses = await asyncio.gather(
*[
query_table(table, requested_features)
for table, requested_features in grouped_refs
]
)

for (idxs, read_rows), (table, requested_features) in zip(
all_responses, grouped_refs
):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when requesting features across multiple tables, we can parallelize the calls to each.

@robhowley robhowley marked this pull request as ready for review October 12, 2024 03:29
@robhowley robhowley requested a review from a team as a code owner October 12, 2024 03:29
@robhowley robhowley requested review from shuchu, franciscojavierarceo and tokoko and removed request for a team October 12, 2024 03:29
@@ -240,7 +240,7 @@ async def get_online_features_async(
native_entity_values=True,
)

for table, requested_features in grouped_refs:
async def query_table(table, requested_features):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add type hints?

@franciscojavierarceo
Copy link
Member

lgtm

@franciscojavierarceo franciscojavierarceo merged commit 043eff1 into feast-dev:master Oct 12, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants