-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
include/rocksdb: Configuration validation while opening a DB #517
base: main
Are you sure you want to change the base?
include/rocksdb: Configuration validation while opening a DB #517
Conversation
The change set includes: 1) A header file that contains the Major, Minor, and Patch versions of speedb as a macro, as well as a set of functions returning info about how/when/where this version of speedb was created. also includes all changes done on build_version.cc.in in the following commits: 1. version: remove superfluous build property The `speedb_build_spdb_key` property is unused and was accidentally imported as part of #1. 2.general: replace RocksDB references in strings with Speedb (#64) This includes references in statuses as well as tools output.
as changes to it are either version updates or fixing compiler errors.
combination of seek random and overwrite random. all threads are doing seek and write interchangeably.
They were made because the original version failed to compile with Clang 12 and 13, complaining about `offsetof()` being applied to a non-standard-layout type. The error is no longer emitted with Clang 14, so this is likely a compiler bug that was fixed, and there's no need for this dirty hack.
…s status code This breaks when comparing against e.g. `Status::NoSpace()` because it has a status code of `Code::kIOError` and only a subcode of `SubCode::kNoSpace`.
…status code This breaks in cases such as `Status::NoSpace()`, which has a status code of `Code::kIOError`, and a subcode of `SubCode::kNoSpace`, but the comparison ends up checking only if a general IO error happened.
The example acquires snapshots but doesn't release them, resulting in a memory leak error when run under ASan.
test doesnt delete db properly.
fix both these tests to operate not only on the default cf
Clang 14 fails to compile with the following error: ``` options/db_options.h:120:21: error: definition of implicit copy constructor for 'MutableDBOptions' is deprecated because it has a user-declared copy assignment operator [-Werror,-Wdeprecated-copy] MutableDBOptions& operator=(const MutableDBOptions&) = default; ^ db/compaction/compaction_job.cc:439:7: note: in implicit copy constructor for 'rocksdb::MutableDBOptions' first required here mutable_db_options_copy_(mutable_db_options), ^ ``` This is caused by the introduction of an assignment operator in #7 due to a diagnostic emitted by Clang 13 which isn't emitted by Clang 14, so this was probably a compiler bug, and we just need to remove the assignment operator.
This is relevant for not including the delay in latency calculation, and is already done for the write rate limiter.
Made use of the CreateFromString to pass an arbitrary URI to create a memtable factory to the tools that use it.
…on (#80) The new Customizable plugin infra in CMake relies on regex parsing of Makefiles, which is very brittle and clunky, as well as being unintuitive for CMake users. The only thing missing from the old infra is plugin `*_FUNC` registration, so just add support for that to the old infra, and remove the Makefile parsing logic.
This memtable supports concurrent readers and writers and provides much better results for insertions and point selections than the skip list memtable, while maintaining more tolerable iteration performance than the built-in RocksDB hash-based memtable.
This allows us to make sure that our additions are included in the build even if the user forgot to set `ROCKSDB_PLUGINS` explicitly.
Currently only the skip list memtable is being tested. Increase coverage by adding the new Speedb memtable as an option as well. For now we'll simply choose randomly between the two, without giving any one of them preference, but we may want to change that in the future.
A last minute change before merging #30 breaks builds that configure `-Werror=unused-parameter` (this is the default for some GCC versions with the `-Wextra` configuration). Fix it by not declaring the `logger` argument to the `CreateMemTableRep()` function.
As part of #90, I accidentally added a comma at the end of the line when choosing the memtable reperesntation to use. This is a tuple construction syntax and was formatted as such by Black, but unfortunately missed by me and during review. Fix it by removing the comma so that the argument isn't passed to db_stress as a Python tuple.
The test seems to fail for now, so in order to avoid headaches for devs working on `main`, let's not test the new memtable until the issues are fixed. This is not intended as a fix for #98, but rather as a temporary workaround to allow testing to continue, since no one is supposed to be using the new memtable yet.
25945d7
to
296a03c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my "Let's consider ..." comment
const Options& opts, | ||
std::set<std::string>& valid_opts, | ||
std::set<std::string>& invalid_opts); | ||
virtual Status Populate(const ConfigOptions& cfg_opts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to be for generating random configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Derived classes should override Populate. In the case of crashtest, yes, it will be generating random values similar to what is happening in db_crashtest.py
. For other use cases, for example workload specific use cases, this method can assign exact values to options, values which we found to be working best for that specific workload/use case. This will allow users, for example when they call DBOpen()
, to have a custom configuration dedicated for their needs.
|
||
namespace ROCKSDB_NAMESPACE { | ||
|
||
class UseCase : public Customizable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see a case for this being in the public API, e.g. to validate configuration options without opening a DB, though it's not clear to me whether users should extend this class. There is a standard disclaimer we put on classes that are only intended to be extended internally.
|
||
namespace ROCKSDB_NAMESPACE { | ||
|
||
class DBCrashtestUseCase : public UseCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be in the public API, at least not all the details. Maybe a factory function.
|
||
protected: | ||
std::vector<uint32_t> memtable_protection_bytes_per_key; | ||
std::vector<uint64_t> block_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree a good goal would be to replicate what the crash test already does, but most of these configuration options will probably be migrated to range options rather than chosen from an enumerated set.
protected: | ||
std::vector<uint32_t> memtable_protection_bytes_per_key; | ||
std::vector<uint64_t> block_size; | ||
std::tuple<int, int> bloom_bits_per_key; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google style calls for class member fields to end in _
std::set<std::string>& invalid_opts) override; | ||
|
||
protected: | ||
std::vector<uint32_t> memtable_protection_bytes_per_key; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's consider in the future that someone adds a new option to something like ColumnFamilyOptions. Unless the UseCase is manually updated, presumably it would only include/exercise the default value for that new option. But will it detect when a non-default option (outside the UseCase) is used?
I believe that this approach with a static enumeration of the options and what values are allowed/exercised will, by default, overlook new options and make it easy for things to be overlooked or drift out of date. If the validation were based on the dynamic configuration structure, I suspect it would be easier to ensure that new options are detected and force the user to specifically handle them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdillinger Thanks for the comment. Indeed, I agree that the current approach is error prone, and I plan on coming up with something more similar to what you are describing. However, can you be a bit more specific on what you mean by "dynamic configuration structure"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strings and maps, as opposed to class/struct fields.
@pdillinger Apart from the comments above, generally speaking - does this seem in the right track for what you had in mind? |
Yes, generally in the right direction. Obviously more work to do to have this code generating random "valid" configurations for the crash test. |
Use key and value only after checking iterator is valid
Add a new option to CompactRangeOptions, a callback object. By default, the callback is empty so CompactRange() will be blocking (as it is now). If the callback is set, the callback object's callback method will be called upon completion with the completion status.
A UseCase class is a Configurable class which represents a specific use case in rocksdb/speedb which we can use to populate options according to the specification of that use case, and/or validate the user's options against the expected possible configurations which fit that use case. As such, we add here 2 methods in Configurable which essentially take a UseCase and a set of options, and validates that the values of these options matches the values that the use case requires.
296a03c
to
41363d6
Compare
TODO:
|
This pull request has picked up a bunch of other changes, making it hard to see what is only the configuration validation part. I guess I can do |
GitHub Runner Bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
RocksDB/Speedb has several knobs that can be controlled via the different Options classes and Options files.
Different programs and use cases (such as db_bench, benchmark tool, stress/crash/unit tests) use a different set of default options that are coded into their programs. Because these configurations are built into the tools, it is difficult for customers to reproduce those configurations. Additionally, it is impossible for the software to validate that a customer configuration is part of the configurations that have been tested.
This PR is an effort to create an interface for populating and validating RocksDB/Speedb options, per use case.
It also includes a partial implementation of that interface for the use case of validating a user's configuration against the set of configurations checked in db_crashtest.