Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove mutex for write #197

Open
wants to merge 17 commits into
base: 6.4.tikv
Choose a base branch
from
Open

Conversation

Little-Wallace
Copy link

Signed-off-by: Little-Wallace [email protected]

Summary

Cherry-pick facebook#7516
This PR is to avoid write operation blocking on mutex when compaction or other method hold the mutex too long.

This reverts commit b84bd77.

Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
@tabokie
Copy link
Member

tabokie commented Jan 27, 2022

Could you pass facebook's CI, or at least figure out why they failed? I see you have several builds timed out. That usually means certain cases are hanging (they don't output PASSED log nor error log).

@Little-Wallace
Copy link
Author

/test

@Little-Wallace
Copy link
Author

Could you pass facebook's CI, or at least figure out why they failed? I see you have several builds timed out. That usually means certain cases are hanging (they don't output PASSED log nor error log).

It seems to be a unstable test. These tests run a long time and may fail because of environment.

@tabokie
Copy link
Member

tabokie commented Feb 7, 2022

@Little-Wallace I don't think so. There are some unstable ones, but they won't cause nearly half of the tests to fail (usually at most 2).

@Little-Wallace
Copy link
Author

@Little-Wallace I don't think so. There are some unstable ones, but they won't cause nearly half of the tests to fail (usually at most 2).

The tests did not fail. But the process was killed because out of time

@tabokie
Copy link
Member

tabokie commented Feb 11, 2022

@Little-Wallace Some tests are hanging, mostly likely on some sync point conditions. It's better to figure out exactly what are those conditions, even though we might not need to fix them.

Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
Copy link
Member

@tabokie tabokie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the bug you just fixed in upstream? So I know where the review process is flawed.

@@ -67,6 +61,7 @@ class ErrorHandler {
// A flag indicating whether automatic recovery from errors is enabled
bool auto_recovery_;
bool recovery_in_prog_;
std::atomic<bool> stop_state_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::atomic<bool> stop_state_;
std::atomic<bool> db_stopped_;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return !bg_error_.ok() &&
bg_error_.severity() >= Status::Severity::kHardError;
}
bool IsDBStopped() { return stop_state_.load(std::memory_order_acquire); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment "Do not require DB mutex held."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

new_options.max_background_compactions,
new_options.max_background_jobs,
new_options.base_background_compactions,
/* parallelize_compactions */ true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* parallelize_compactions */ true);
true /* parallelize_compactions */);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that all code of rocksdb place the comment before param

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're half right. It's a per-file styling and this file happens to use postposition which is rare within the whole codebase...


// "number <= current_log_number - 1" is equivalent to
// "number < current_log_number".
MarkLogsSynced(current_log_number - 1, true, s);
if (!s.ok()) {
error_handler_.SetBGError(s, BackgroundErrorReason::kFlush);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you set this error now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be saved at "db/db_impl/db_impl_compaction_flush.cc" L549 and L210

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this function, we will not hold mutex so that we can not save BG error here.

@Little-Wallace
Copy link
Author

Could you explain the bug you just fixed in upstream? So I know where the review process is flawed.

The function SyncClosedLogs must not hold mutex_ because this function may wait for other thread until they finish wal fsync. So we must unlock mutex_ before every where calling SyncClosedLogs

@Little-Wallace
Copy link
Author

Little-Wallace commented Feb 15, 2022

I fix the problem of facebook#7516 in this commit: facebook@5fe832a

There are some other bugs which are bout code only in main branch of facebook/rocksdb, so I did not port them here.

@@ -2140,6 +2148,7 @@ Status DBImpl::CreateColumnFamilyImpl(const ColumnFamilyOptions& cf_options,
ROCKS_LOG_INFO(immutable_db_options_.info_log,
"Created column family [%s] (ID %u)",
column_family_name.c_str(), (unsigned)cfd->GetID());
single_column_family_mode_.store(false, std::memory_order_release);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move this line from 2135?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget why I place it here.... Maybe I can try to place it back to origin

@@ -175,7 +174,9 @@ Status DBImpl::FlushMemTableToOutputFile(
// flushed SST may contain data from write batches whose updates to
// other column families are missing.
// SyncClosedLogs() may unlock and re-lock the db_mutex.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"db_mutex" -> log_write_mutex_

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ret = s;
{
InstrumentedMutexLock lock(&log_write_mutex_);
for (auto l : logs_to_free_) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit to make logs_to_free_ be protected by both log_write_mutex_ and mutex_

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes no sense. But i think unlock mutex_ during this function may cause other bad case.

Copy link
Member

@tabokie tabokie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. But I'd prefer seeing a full test report before merging.

@@ -1131,7 +1131,13 @@ class DBImpl : public DB {
}
}
};

struct LogContext {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add newlines around this struct definition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1934,6 +1939,7 @@ class DBImpl : public DB {
InstrumentedCondVar atomic_flush_install_cv_;

bool wal_in_db_path_;
std::atomic<uint64_t> max_total_wal_size_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it together with max_total_in_memory_state_.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

if (max_total_wal_size > 0) {
return max_total_wal_size;
}
return 4 * max_total_in_memory_state_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_total_in_memory_state_ is an atomic now, need to specify memory order as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Little-Wallace
Copy link
Author

I have finished the benchmark test. See details in https://docs.google.com/document/d/1f72BGmN1kREorkaCzhbsB0oocTP13IKJw4jy5mfm-TE/edit#heading=h.mtjjjfyw1ei
I will run a long time stable test to make sure this PR would not cause other bug or error.

Signed-off-by: Little-Wallace <[email protected]>
@Connor1996 Connor1996 self-requested a review February 22, 2022 08:37
mutex_.Lock();
bool need_log_sync = !write_options.disableWAL && write_options.sync;
bool need_log_dir_sync = need_log_sync && !log_dir_synced_;
LogContext log_context;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not init need_log_dir_sync here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need_log_dir_sync was protected by mutex_ before. This pr will protected this member variable by log_write_mutex_ so it could only be accessed in the method PreprocessWrite.

Copy link
Member

@Connor1996 Connor1996 Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean log_dir_synced_ is protected by mutex_? Oh, reasonable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// It's safe to access logs_ with unlocked mutex_ here because:

change mutex_ to log_write_mutex_

Signed-off-by: Little-Wallace <[email protected]>
Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should update the comment
image

@@ -220,10 +216,7 @@ void DBImpl::FindObsoleteFiles(JobContext* job_context, bool force,
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the log_sync_cv_ may wait here while holding the mutex

Copy link
Member

@tabokie tabokie Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think condvar::Wait will implicitly release mutex (a mutex is paired with it when constructed). Oh, you mean mutex_. That's a good catch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it will release the pairing mutex log_write_mutex_, but not mutex_

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch... I'm thinking about how to solve this problem....

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether it is still correctly to release mutex_ here because if some other thread calling FindObsoleteFiles but this thread is only finish half of its task.

Copy link
Member

@tabokie tabokie Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mutex_ is released before, I don't see a problem here. But you can't simply re-acquire the mutex after Wait(). That will break the lock order constraint.

Maybe you can skip those unsync-ed log files, and clean them in the next round?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I will unlock mutex_ before FindObsoleteLogFiles method. PTAL

Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM. Can we add a test to cover the case of the mutex?

@Little-Wallace
Copy link
Author

rest LGTM. Can we add a test to cover the case of the mutex?

I thought there are enough tests in RocksDB to cover this condition? I have passed the CI of facebook to make sure that it is no problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants