Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code that uses withTimeout often fails and cannot be safely test covered using the kotlinx.coroutines.test #3588

Closed
amal opened this issue Jan 18, 2023 · 7 comments
Labels

Comments

@amal
Copy link

amal commented Jan 18, 2023

Right now any code that uses withTimeout in the TestScope or background TestScope can be failed with TimeoutCancellationException on any suspension. And in such case it will fail with misleading, I would even say completely false message, e.g.:

Timed out waiting for 5000 ms

image

This is an exaggerated example where it is obvious (long timeout with much shorter test execution time). In reality, it is often difficult to understand that the matter is actually in TestScope usage.

I believe that it's a very unexpected behavior for developers and a source of lost hours and maybe even days to understand the source of the problem. I even knew about such behavior earlier, but in a few months forgot about it and again fell into this trap, wasting hours when tried to write more tests for the MVI library.

As already mentioned in #1374, the problem is further complicated by the fact that TimeoutCancellationException is CancellationException, thus silently canceling parent coroutines.

Just to mention, here are examples where other people think about withTimeout as reliable function in the context of coroutines tests, which it is not right now!

Proposal

  1. I am sure that code wrapped in the withTimeout should work as usual in any tests without sudden fails.

  2. If somehow it's not possible or as a migration measure what can be done better:
    If withTimeout fails because of the TestScope, change the error message so that the cause would be obvious.

Workaround

For those suffering in the same situation 🤗
Right now I use this replacement instead of the unsafe withContext. At least it works as expected right now.
I hope it will not be broken with something like #982

public suspend inline fun <T> withTimeoutSafe(timeMillis: Long, crossinline block: suspend CoroutineScope.() -> T): T {
    // just for convenience, can be safely removed
    contract {
        callsInPlace(block, InvocationKind.EXACTLY_ONCE)
    }
    val context = currentCoroutineContext()
    // Maybe Dispatchers.IO could be better, but it's not available in the common code for multiplatform project
    return withContext(Dispatchers.Default) {
        withTimeout(timeMillis) {
            withContext(context) {
                block()
            }
        }
    }
}
@amal amal added the bug label Jan 18, 2023
@amal amal changed the title Code that uses withTimeout always fails and cannot be test covered using the kotlinx.coroutines.test Code that uses withTimeout often fails and cannot be safely test covered using the kotlinx.coroutines.test Jan 18, 2023
@dkhalanskyjb dkhalanskyjb added test and removed bug labels Jan 19, 2023
@dkhalanskyjb
Copy link
Collaborator

Without a reproducer, it's difficult to understand your issue, but looks like it's a duplicate of #3179.

@amal
Copy link
Author

amal commented Jan 19, 2023

@dkhalanskyjb Here is a reproducer, if you need it.

All these tests will fail and it will be unexpected for the developer. These problems are hard to debug and google for, unless you are already aware and didn't forget that it's unsafe to use withTimeout and delay at least in the test code.
Just unwrap withTimeout parts from the code and everything will work as expected.

import kotlinx.coroutines.CoroutineScope
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.Job
import kotlinx.coroutines.cancelAndJoin
import kotlinx.coroutines.currentCoroutineContext
import kotlinx.coroutines.flow.MutableStateFlow
import kotlinx.coroutines.flow.first
import kotlinx.coroutines.isActive
import kotlinx.coroutines.launch
import kotlinx.coroutines.plus
import kotlinx.coroutines.sync.Mutex
import kotlinx.coroutines.sync.Semaphore
import kotlinx.coroutines.sync.withLock
import kotlinx.coroutines.test.runTest
import kotlinx.coroutines.withTimeout
import kotlin.test.Test
import kotlin.test.assertFalse
import kotlin.test.assertTrue
import kotlin.test.fail

class CoroutinesTest {

    private companion object {
        private const val USE_WITH_TIMEOUT = true
        private const val TIMEOUT_MS = 5_000L
        private const val TEST_REPETITIONS = 1_000
    }

    @Test
    fun with_timeout_is_unsafe_in_tests() = runTest(dispatchTimeoutMs = 10_000) {
        repeat(times = 100) {
            val mutex = Mutex(locked = true)
            val job = CoroutineScope(Dispatchers.Default).launch {
                mutex.unlock()
            }
            withTimeout(TIMEOUT_MS) {
                // Anything that we will actually wait for
                mutex.withLock {}
            }
            job.cancelAndJoin()
        }
    }


    @Test
    fun with_timeout_mutex() = testImplementation(
        createMutex = { Mutex(locked = true) },
        doUnlock = {
            assertTrue(isLocked, "doUnlock called for non-locked Mutex (Mutex)")
            unlock()
        },
        waitForMutexUnlock = {
            if (isLocked) {
                withLock {
                    assertTrue(isLocked, "Mutex.isLocked = false when locked (Mutex)")
                }
            }
            assertFalse(isLocked, "Mutex.isLocked = false (Mutex)")
        },
    )

    @Test
    fun with_timeout_semaphore() = testImplementation(
        createMutex = { Semaphore(permits = 1, acquiredPermits = 1) },
        doUnlock = {
            try {
                release()
            } catch (_: IllegalStateException) {
                fail("doUnlock called for non-locked Mutex (Semaphore)")
            }
        },
        waitForMutexUnlock = {
            // wait for unlock
            acquire()
            try {
                release()
            } catch (_: IllegalStateException) {
                fail("Mutex.isLocked = false (Semaphore)")
            }
        },
    )

    // MutableStateFlow as a locked "Mutex"
    @Test
    fun with_timeout_state_flow() = testImplementation(
        // MutableStateFlow as a locked "Mutex"
        createMutex = { MutableStateFlow(true) },
        doUnlock = {
            assertTrue(compareAndSet(expect = true, update = false), "doUnlock called for non-locked Mutex (StateFlow)")
            assertFalse(value, "couldn't unlock Mutex (Mutex)")
        },
        waitForMutexUnlock = {
            if (value) {
                // wait for unlock
                assertFalse(first { !it }, "Mutex waiting failed (StateFlow)")
            }
            assertFalse(value, "Mutex.isLocked = false (StateFlow)")
        },
    )

    private inline fun <M> testImplementation(
        crossinline createMutex: () -> M,
        crossinline doUnlock: M.() -> Unit,
        crossinline waitForMutexUnlock: suspend M.() -> Unit,
    ) = runTest(dispatchTimeoutMs = 10_000) {
        // Scopes that will do the actual work in the background
        val scopes = mapOf(
            "TestScope + Dispatchers.Default" to (this + Dispatchers.Default),
            "BackgroundTestScope + Dispatchers.Default" to (backgroundScope + Dispatchers.Default),
            "Job + Dispatchers.Default" to CoroutineScope(Job() + Dispatchers.Default),
        )
        repeat(TEST_REPETITIONS) { iteration ->
            for ((scopeName, scope) in scopes) {
                val info = "#$iteration, $scopeName"
                try {
                    val mutex = createMutex()
                    val asyncJob = scope.launch {
                        mutex.doUnlock()
                        // busy wait
                        @Suppress("ControlFlowWithEmptyBody", "EmptyWhileBlock")
                        while (currentCoroutineContext().isActive) {
                        }
                    }

                    when {
                        USE_WITH_TIMEOUT -> withTimeout(TIMEOUT_MS) {
                            mutex.waitForMutexUnlock()
                        }

                        else -> mutex.waitForMutexUnlock()
                    }

                    asyncJob.cancelAndJoin()
                } catch (e: Throwable) {
                    throw IllegalStateException("$e // $info", e)
                }
            }
        }
    }
}

@amal
Copy link
Author

amal commented Jan 19, 2023

@dkhalanskyjb, it's not a duplicate of #3179
Even if it would be implemented, the main problem here is misleading the developers and making debugging a lot harder.

But if just the message from failed withTimeout would signal that it was virtual timeout time passed, not the real one - it would be a saver for hundreds of hours (counting different developers around the world).

And if IDE could warn about withTimeout and delay behavior in test code (e.g. if TestScope is anywhere near) - it would prevent many such problems from the beginning.

@dkhalanskyjb
Copy link
Collaborator

Ok, I see what's going on.

Let's focus on facts and not rhetorics like "saving hundreds of hours," or we'll be here all day writing about how great the impact of this is instead of getting things done. We're a widely-used library, and any change we make will affect many people.

I am sure that code wrapped in the withTimeout should work as usual in any tests without sudden fails.

People do often test that the withTimeout invocations that occur in their code really do time out when the delay is long. If we make any delays instant but make withTimeout still observe non-virtual time, these tests will be impossible to write.

If withTimeout fails because of the TestScope, change the error message so that the cause would be obvious.

This is a really good suggestion, thank you! Opened a PR: #3591

@amal
Copy link
Author

amal commented Jan 19, 2023

@dkhalanskyjb thank you! I'm glad that at least something from the proposal got attention.

Still, I think that no tests from the reproducer should fail here.
Do you think that withTimeout behavior here is fine and working as it should?
Your PR wouldn't provide any help in cases when withTimeout would be used in the tested code, but not the test itself, resulting in misleading failure just because withTimeout usage.

@qwwdfsad
Copy link
Member

Do you think that withTimeout behavior here is fine and working as it should?

We don't think so, but with a slightly different angle.
After #1374 the issue will be much more clear -- TimeoutException won't be subclassing CancellationException, making an error clear and visible, especially with new error messages

@dkhalanskyjb
Copy link
Collaborator

Do you think that withTimeout behavior here is fine and working as it should?

I don't see a way to change that behavior without also affecting the use case of checking that the timeout does happen. When researching the existing codebases, I've seen an order of magnitude more tests of that sort than I see discontent about withTimeout using virtual time. So, breaking these tests for the use case you're showing, especially without a clear way of fixing them, is not the way forward.

Also, from your synthetic reproducer, it's unclear what you're actually trying to achieve. If the code that unlocks the mutex also uses virtual time, everything works fine. Why doesn't it? There are some use cases listed in #3179, like connecting to a database in tests (which, in most codebases, would be an anti-pattern), but maybe in your case, the problem can be mitigated by mocking the other dispatchers so that they use virtual time as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants