perf: Parallelize read calls by table and batch #4619

robhowley · 2024-10-11T22:10:57Z

What this PR does / why we need it:

Improve the use of async batch calls to dynamo db for get_online_features_async

tested on a feature service w 3 feature views and 11 features
compared to the sync method, the async method was
- 50% faster w 5 entities retrieved
- 75% faster w 80 entities retrieved

Which issue(s) this PR fixes:

Misc

Signed-off-by: Rob Howley <[email protected]>

robhowley · 2024-10-12T03:23:55Z

sdk/python/feast/infra/online_stores/dynamodb.py

+        batches = []
+        entity_id_batches = []
+        while True:
+            batch = list(itertools.islice(entity_ids_iter, batch_size))
+            if not batch:
+                break
+            entity_id_batch = self._to_client_batch_get_payload(
+                online_config, table_name, batch
+            )
+            batches.append(batch)
+            entity_id_batches.append(entity_id_batch)


construct the batches of ids/entity_ids that we'll be looking up

robhowley · 2024-10-12T03:24:26Z

sdk/python/feast/infra/online_stores/dynamodb.py

+            response_batches = await asyncio.gather(
+                *[
+                    client.batch_get_item(
+                        RequestItems=entity_id_batch,
+                    )
+                    for entity_id_batch in entity_id_batches
+                ]
+            )


make those batch requests in parallel.

note: gather maintains order

robhowley · 2024-10-12T03:25:16Z

sdk/python/feast/infra/online_stores/dynamodb.py

+        for batch, response in zip(batches, response_batches):
+            result_batch = self._process_batch_get_response(
+                table_name,
+                response,
+                entity_ids,
+                batch,
+                to_tbl_response=to_tbl_resp,
+            )
+            result_batches.append(result_batch)


format the responses to the final format. we iterate through the list three times in stead of one, but make up for it in asyncing the batches

robhowley · 2024-10-12T03:26:09Z

sdk/python/feast/infra/online_stores/online_store.py

+        all_responses = await asyncio.gather(
+            *[
+                query_table(table, requested_features)
+                for table, requested_features in grouped_refs
+            ]
+        )
+
+        for (idxs, read_rows), (table, requested_features) in zip(
+            all_responses, grouped_refs
+        ):


when requesting features across multiple tables, we can parallelize the calls to each.

franciscojavierarceo · 2024-10-12T09:21:13Z

sdk/python/feast/infra/online_stores/online_store.py

@@ -240,7 +240,7 @@ async def get_online_features_async(
            native_entity_values=True,
        )

-        for table, requested_features in grouped_refs:
+        async def query_table(table, requested_features):


can you add type hints?

franciscojavierarceo · 2024-10-12T09:23:00Z

lgtm

robhowley marked this pull request as draft October 11, 2024 22:12

robhowley changed the title ~~Chore: put all dynamo calls in an asyncio.gather~~ perf: Put all dynamo calls in an asyncio.gather Oct 11, 2024

robhowley force-pushed the rh-parallel branch from cbf0773 to 66d5328 Compare October 11, 2024 22:17

robhowley changed the title ~~perf: Put all dynamo calls in an asyncio.gather~~ perf: Parallelize read calls by table and batch Oct 12, 2024

robhowley added 4 commits October 11, 2024 23:17

put all dynamo calls in an asyncio.gather

3b37b8a

Signed-off-by: Rob Howley <[email protected]>

fix: indexing

b3b7ed1

Signed-off-by: Rob Howley <[email protected]>

parallelize the per table lookups

3a30284

Signed-off-by: Rob Howley <[email protected]>

ruff

e6fcd78

Signed-off-by: Rob Howley <[email protected]>

robhowley force-pushed the rh-parallel branch from 667bdbd to e6fcd78 Compare October 12, 2024 03:18

robhowley commented Oct 12, 2024

View reviewed changes

robhowley marked this pull request as ready for review October 12, 2024 03:29

robhowley requested a review from a team as a code owner October 12, 2024 03:29

robhowley requested review from shuchu, franciscojavierarceo and tokoko and removed request for a team October 12, 2024 03:29

franciscojavierarceo reviewed Oct 12, 2024

View reviewed changes

franciscojavierarceo approved these changes Oct 12, 2024

View reviewed changes

franciscojavierarceo added the ok-to-test label Oct 12, 2024

franciscojavierarceo enabled auto-merge (squash) October 12, 2024 12:12

franciscojavierarceo merged commit 043eff1 into feast-dev:master Oct 12, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Parallelize read calls by table and batch #4619

perf: Parallelize read calls by table and batch #4619

robhowley commented Oct 11, 2024 •

edited

Loading

robhowley Oct 12, 2024

robhowley Oct 12, 2024

robhowley Oct 12, 2024

robhowley Oct 12, 2024

franciscojavierarceo Oct 12, 2024

franciscojavierarceo commented Oct 12, 2024

perf: Parallelize read calls by table and batch #4619

perf: Parallelize read calls by table and batch #4619

Conversation

robhowley commented Oct 11, 2024 • edited Loading

What this PR does / why we need it:

Which issue(s) this PR fixes:

Misc

robhowley Oct 12, 2024

Choose a reason for hiding this comment

robhowley Oct 12, 2024

Choose a reason for hiding this comment

robhowley Oct 12, 2024

Choose a reason for hiding this comment

robhowley Oct 12, 2024

Choose a reason for hiding this comment

franciscojavierarceo Oct 12, 2024

Choose a reason for hiding this comment

franciscojavierarceo commented Oct 12, 2024

robhowley commented Oct 11, 2024 •

edited

Loading