forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-48742][SS] Virtual Column Family for RocksDB
### What changes were proposed in this pull request? Introducing virtual column family to RocksDB. We attach an 2-byte-Id prefix as column family identifier for each of the key row that is put into RocksDB. The encoding and decoding of the virtual column family prefix happens at the `RocksDBKeyEncoder` layer as we can pre-allocate extra 2 bytes and avoid additional memcpy. - Remove Physical Column Family related codes as this becomes potentially dead code till some caller starts using this. - Remove `useColumnFamilies` from `StateStoreChangelogV2` API. ### Why are the changes needed? Currently within the scope of the arbitrary stateful API v2 (transformWithState) project, each state variable is stored inside one [physical column family](https://github.com/facebook/rocksdb/wiki/Column-Families) within the RocksDB state store instance. Column families are also used to implement secondary indexes for various features. Each physical column family has its own memtables, creates its own SST files, and handles compaction independently on those independent SST files. When the number of operations to RocksDB is relatively small and the number of column families is relatively large, the overhead of handling small SST files becomes high, especially since all of these have to be uploaded in the snapshot dir and referenced in the metadata file for the uploaded RocksDB snapshot. Using prefix to manage different key spaces / virtual column family could reduce such overheads. ### Does this PR introduce _any_ user-facing change? No. If `useColumnFamilies` are set to true in the `StateStore.init()`, virtual column family will be used. ### How was this patch tested? Unit tests in `RocksDBStateStoreSuite`, and integration tests in `TransformWithStateSuite`. Moved test suites in `RocksDBSuite` into `RocksDBStateStoreSuite` because some previous verification functions are now moved into `RocksDBStateProvider` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47107 from jingz-db/virtual-col-family. Lead-authored-by: jingz-db <[email protected]> Co-authored-by: Jing Zhan <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
- Loading branch information
Showing
8 changed files
with
724 additions
and
738 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.