-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change TopicBasedRemoteLogMetadataManager.ensureInitializedAndNotClose
so that it blocks if not initialised
#12
Comments
@mdedetrich could we add some stack traces to validate how related are the failures?
My impression is that I have tried adding the following wait logic to abstract class TieredStorageTestHarness extends IntegrationTestHarness {
//...
@BeforeEach
override def setUp(testInfo: TestInfo): Unit = {
super.setUp(testInfo)
contextOpt = Some(new TieredStorageTestContext(zkClient, servers, producerConfig, consumerConfig, securityProtocol))
waitForRLMMInitialization(servers)
}
//...
def waitForRLMMInitialization(brokers: mutable.Buffer[KafkaServer]): Unit = {
while (true) {
println("Waiting for RLMM to initialize")
val ready = brokers.map(_.remoteLogManager.remoteLogMetadataManager.asInstanceOf[TopicBasedRemoteLogMetadataManager])
.forall(_.isInitialized)
if (ready) return
Thread.sleep(5000)
}
}
//...
} |
So there are multiple problems at hand here. One of them is the asynchronous initialization (which is what this github issue is referencing) and another problem is what you are pertaining to, i.e. some logic bug in fetch/consumer consumption. @jeqo Do you want to make another github issue specifically regarding the consumer/fetch problem which you can focus on. |
Sure! #15 |
Upstream PR created at apache#13689 |
After discussion with @gharris1727 and investigation with current implementation of tiered storage and existing problems with the
TieredStorageTestHarness
, we came to the conclusion that the current design ofTopicBasedRemoteLogMetadataManager
has a flaw when it comes to initialisation.For various legitimate reasons, the
TopicBasedRemoteLogMetadataManager.configure
method is asynchronous. While this is acceptable what is problematic is that other methods that need to be implemented in this interface callTopicBasedRemoteLogMetadataManager.ensureInitializedAndNotClose
as a check which throws an exception if its not initialized.Rather than
TopicBasedRemoteLogMetadataManager.ensureInitializedAndNotClose
throwing anIllegalStateException
we instead should be blocking the method until initialisation has occurred (i.e. via the use of aCountDownLatch
). Note that throwing anIllegalStateException
ifTopicBasedRemoteLogMetadataManager
is currently being closed is legitimate, its initialisation specifically thats the issue.This issue is inadvertently causing TS tests (via the
TieredStorageTestHarness
) to fail in https://github.com/aiven/kafka/tree/3.3-2022-10-06-tiered-storage branch because the test harness runs before the initialisation has occurred.Note that upstream also has this issue (see https://github.com/apache/kafka/blob/34d56dc8d00bd27955eb9bb6ac01d5ae7f134dbd/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManager.java#L492-L501)
The text was updated successfully, but these errors were encountered: