Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Cassandra online store, concurrent fetching for multiple entities #3356

Merged

Conversation

hemidactylus
Copy link
Collaborator

This changes the retrieval of features from the Cassandra online store by leveraging the
Cassandra driver's native concurrency capabilities.

When there are several entities to be retrieved, instead of a sequential read one-by-one, entity after entity,
the reads are executed concurrently, with the driver ensuring the results are kept in the correct order and the call
returns when all results are available.
This, as measured in realistic environments, implies a speedup of 2-3x for retrieval of 20 to 100 entities at once.

Using the Cassandra driver's execute_concurrent_with_args function requires a new parameter controlling the maximum amount of concurrency to use (somewhat bounded by the number of vCPUs at hand): for transparency, this is exposed in the feature store configuration yaml as a new parameter, which is documented and correctly handled by the guided procedure of feast init -t cassandra.

minimal handling of exceptions in concurrent query execution
read_concurrency parameter in Cassandra online store config yaml

Signed-off-by: Stefano Lottini <[email protected]>
@hemidactylus
Copy link
Collaborator Author

/lgtm

@feast-ci-bot
Copy link
Collaborator

@hemidactylus: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, hemidactylus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [adchia,hemidactylus]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 00fa21f into feast-dev:master Nov 29, 2022
@hemidactylus hemidactylus deleted the sl-cassandra-optimize-bulk-reads branch November 29, 2022 15:17
kevjumba pushed a commit that referenced this pull request Dec 5, 2022
# [0.27.0](v0.26.0...v0.27.0) (2022-12-05)

### Bug Fixes

* Changing Snowflake template code to avoid query not implemented … ([#3319](#3319)) ([1590d6b](1590d6b))
* Dask zero division error if parquet dataset has only one partition ([#3236](#3236)) ([69e4a7d](69e4a7d))
* Enable Spark materialization on Yarn ([#3370](#3370)) ([0c20a4e](0c20a4e))
* Ensure that Snowflake accounts for number columns that overspecify precision ([#3306](#3306)) ([0ad0ace](0ad0ace))
* Fix memory leak from usage.py not properly cleaning up call stack ([#3371](#3371)) ([a0c6fde](a0c6fde))
* Fix workflow to contain env vars ([#3379](#3379)) ([548bed9](548bed9))
* Update bytewax materialization ([#3368](#3368)) ([4ebe00f](4ebe00f))
* Update the version counts ([#3378](#3378)) ([8112db5](8112db5))
* Updated AWS Athena template ([#3322](#3322)) ([5956981](5956981))
* Wrong UI data source type display ([#3276](#3276)) ([8f28062](8f28062))

### Features

* Cassandra online store, concurrency in bulk write operations ([#3367](#3367)) ([eaf354c](eaf354c))
* Cassandra online store, concurrent fetching for multiple entities ([#3356](#3356)) ([00fa21f](00fa21f))
* Get Snowflake Query Output As Pyspark Dataframe ([#2504](#2504)) ([#3358](#3358)) ([2f18957](2f18957))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants