-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward-port Support for Cassandra [WIP] #475
Forward-port Support for Cassandra [WIP] #475
Conversation
…#385) Co-authored-by: Willem Pienaar <[email protected]>
Rather than the Maven protobuf plugin running on the same symlinked definitions in several Java modules, localize this process into one module that the others depend on. This provides a single module that can be depended on by third-party extensions with the bare minimum of dependencies. Also removes proto files that are no longer used.
* Add PR template * Remove line about not using fixes * Add notes about CLA and release notes
* Initial commit of changelog up to 0.4.3 * Remove unreleased changes on master * Add missing changelog manually Co-authored-by: Khor Shu Heng <[email protected]>
…on (feast-dev#421) Bumps [hibernate-validator](https://github.com/hibernate/hibernate-validator) from 6.0.13.Final to 6.1.0.Final. - [Release notes](https://github.com/hibernate/hibernate-validator/releases) - [Changelog](https://github.com/hibernate/hibernate-validator/blob/master/changelog.txt) - [Commits](hibernate/hibernate-validator@6.0.13.Final...6.1.0.Final) Signed-off-by: dependabot[bot] <[email protected]>
This forward-ports a straggling commit from feast-dev#407: it was missed when initially creating the datatypes module because Sonatype publishing setup was added concurrently.
* Add documentation for bigquery batch retrieval * Fix formatting for multiline comments
Allow log level to be set via environmental variable. Add ability to set appender type in serving. Remove logback-classic from ingestion as it is a library so should not bring its own impl. Upgrade log4j to 2.12.1 to support objectMessageAsJsonObject. Fix logger config targeting feast package in serving an add same concept for core.
* Add presence, shape and domain fields to EntitySpec and FeatureSpec These are copied from definition in Schema.proto in tensorflow data validation * Add presence,shape and domain fields to Field class
* Add retry options to BigQuery Add two fields to job properties and use them to set retry options in all BigQuery job waitFor calls, as per Google example. * Change timeout defaults and add config to e2e test to fix failure.
* Change job service connection to use a connection pool * Close connection * Remove isConnected condition * Add simple test to test that pool recovers broken connections * Catch JedisExceptions separately * Catch only JedisConnectionException
* Allow users not to set max age for batch retrieval * Fix typo in test assertion
* Update Changelog * Remove closed issues
* Deduplicate example notebooks * Merge docker-compose.yml for both batch and online serving.
* Update README.md and remove versions from Helm Charts * Fix type in README.md
* Encode feature row before storing in Redis * Include encoding as part of RedisMutationDoFn Co-authored-by: Khor Shu Heng <[email protected]>
Please see this issue for more details feast-dev#527
…east-dev#534) This seems to make the test pass more deterministically If this value is higher than the one used for sending output (50ms) some messages may be lost leading to failed test
) * Update base Docker image for building Feast Serving image - Add clean phase before packaging for more deterministic build (in case host directory is dirty) - Move the downloading of grpc-health-probe in Feast Serving to build stage so the production stage does not need extra tools like wget, for slimmer production image. * Update base Docker image for Feast Serving in Dockerfile.dev
* create cassandra store for registration and ingestion * Downgraded Guava to 25 * Beam 2.16 uses Cassandra 3.4.0 (So we cannot use Cassandra 4.x which shades Guava) * Cassandra 3.4.0 uses Guava version 16.0 but has a compatibility check to use a different class when we use version > 19.0. * Guava version 26 (version previously used) has breaking change to method used in compatibility check in Cassandra's dependency, hence version 25 * Using Cassandra's internal field 'writetime' to handle out of order arrivals. When older records where the primary key already exist in Cassandra are ingested, they are set as tombstones in Cassandra and ignored on retrieval. * Aware that this way of handling out of order arrival is specific to Cassandra, but until we have a general way to handle out of order arrivals we need to do it this way * Cassandra's object mapper requires stating table's name along with @table annotation * table_name is still part of CassandraConfig for use in serving module * if user registers CassandraConfig with a different table name other than "feature_store", this will throw an exception * add cassandra serving service * Abstracted OnlineServingService for common implementation of online serving stores * Complete tests remain in RedisServingServiceTest while Cassandra tests only contain basic tests for writes, and some other implementation specific to Cassandra * update documentation to reflect current API and add cassandra store to docs * add default expiration to cassandra config for when featureset does not have max age * docs update, spotless check, and bug fix on cassandra schema
* create cassandra store for registration and ingestion * Downgraded Guava to 25 * Beam 2.16 uses Cassandra 3.4.0 (So we cannot use Cassandra 4.x which shades Guava) * Cassandra 3.4.0 uses Guava version 16.0 but has a compatibility check to use a different class when we use version > 19.0. * Guava version 26 (version previously used) has breaking change to method used in compatibility check in Cassandra's dependency, hence version 25 * Using Cassandra's internal field 'writetime' to handle out of order arrivals. When older records where the primary key already exist in Cassandra are ingested, they are set as tombstones in Cassandra and ignored on retrieval. * Aware that this way of handling out of order arrival is specific to Cassandra, but until we have a general way to handle out of order arrivals we need to do it this way * Cassandra's object mapper requires stating table's name along with @table annotation * table_name is still part of CassandraConfig for use in serving module * if user registers CassandraConfig with a different table name other than "feature_store", this will throw an exception * add cassandra serving service * Abstracted OnlineServingService for common implementation of online serving stores * Complete tests remain in RedisServingServiceTest while Cassandra tests only contain basic tests for writes, and some other implementation specific to Cassandra * update documentation to reflect current API and add cassandra store to docs * add default expiration to cassandra config for when featureset does not have max age * docs update, spotless check, and bug fix on cassandra schema
Pr Comments and using ttl for cassandra job service 1. Still unresolved is why the CassandraServingServiceITTest is failing, I'm thinking its because I'm not mocking the feature request correctly
1ed0956
to
f002093
Compare
A belated thanks for getting the ball rolling on this! I've been reluctant to push forward on it myself due to a few reasons I'll give below. Couple of general remarks from my PoV when we get back 'round to this:
Coming back to what has stalled me:
|
Just want to chime in her as well. The effort that has been put into this is very impressive @Wirick. Forward porting Cassandra support would be pretty big. I'll not comment specifically on the PR itself at the moment, but I'll echo the reasons for delaying this forward port previously. #386 is a big change, but dare I say relatively predictable in how it will change our APIs. #567 will hopefully add structure to this PR and others, and should not bring any big surprises. I'd recommend waiting for these to both be merged in before proceeding with this PR. The biggest uncertainty for me is still which path we will take with regards to feature references and our data model, documented in #479. Getting the API right is critical. We have one more window of opportunity (with the removal of versions) to potentially make other breaking changes. If possible I'd want to come to some kind of consensus in that issue as well before merging this PR, but it probably does not have to be a blocker. |
@ches @woop I generally agree with everything that we've discussed. In our case because of what I was outlining in #479 we've mostly abstracted the api into a centralized module called features, which can be used to create, publish and serve features using minimal feast API interaction. This means it's relatively to stay up to date as long as we change those integration points. I'm going to be keeping this current with master, because some recent changes, such as the new feature value metrics provide a lot of value, and I'm the boldly going forward type. I'm fine waiting and rebasing as development proceeds. In terms of the new drivers I've been really enjoying the new api and creation of cassandra sessions especially, and don't think that having a different dependency in ingestion outweighs the significant improvement in intuitiveness. But at the end of the day I'm flexible |
* Ordering keys in ingestion and serving * Notracing
* Versionless serving option * Adding in tracing * Make nulls not update feature values
3563323
to
b43f466
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Wirick The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@Wirick: The following tests failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I'm still having trouble getting the serving unit tests to pass, I think because of something that I'm not mocking, but I'm not sure. Wanted to share the work if people are interested, like @ches. I will be cleaning this up further, but I have a bit of a queue, so please be patient. This logic however seems to work well in our staging cluster right now. But want to discuss best practices and optimization
This pr ports the cassandra logic forward to be compatible with feast>=.4
Fixes #335