Time the runtime of Drain and DrainOnShutdown #578

rhuffy · 2024-11-08T21:59:09Z

These runtimes will be useful to determine reasonable settings for K8s termination grace period.

LogQL can be used to treat these logs lines as metrics for aggregating these values across many clusters.

inespot · 2024-11-09T04:00:45Z

src/java/org/apache/cassandra/service/StorageService.java

@@ -754,6 +757,8 @@ public void runMayThrow() throws InterruptedException
                ScheduledExecutors.nonPeriodicTasks.shutdown();
                if (!ScheduledExecutors.nonPeriodicTasks.awaitTermination(1, MINUTES))
                    logger.warn("Miscellaneous task executor still busy after one minute; proceeding with shutdown");
+
+                logger.info("DrainOnShutdown completed in {} ms", SafeArg.of("ms", watch.elapsed(TimeUnit.MILLISECONDS)));


Would it not be simpler to just add metrics? Or is there any advantages to using the logs?

Since we poll the metrics endpoints, it's likely that a metric reported just before shutdown would be missed.

LogQL supports querying logs as if they were metrics, so there is no benefit to using metrics here over a log.

Metrics are also a bit more complicated to set up, and are better suited for values that should be continuously sampled, instead of just an infrequently fired event.

Can you add logs at the start of each function (so we understand if a SIGKILL was sent before the thread completed) flush each of the logs?

added a log to drainOnShutdown

the setMode calls i.e:setMode(Mode.DRAINING, "starting drain process", true); will emit a log

void setMode(Mode m, @Safe String msg, boolean log) { operationMode = m; if (log) logger.info(m.toString(), SafeArg.of("msg", msg)); else logger.debug(m.toString(), SafeArg.of("msg", msg)); }

inespot

From offline conv: as a FLUP we need to properly shutdown the logger in sls in order to make sure those logs do not get dropped when Cassandra shuts down gracefully

time the runtime of Drain and DrainOnShutdown

f5f0cdb

inespot reviewed Nov 9, 2024

View reviewed changes

add info log for starting DrainOnShutdown

38e3f15

inespot approved these changes Nov 12, 2024

View reviewed changes

rhuffy merged commit eea1112 into palantir-cassandra-2.2.18 Nov 12, 2024
6 checks passed

rhuffy deleted the rh/time-drain branch November 13, 2024 04:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time the runtime of Drain and DrainOnShutdown #578

Time the runtime of Drain and DrainOnShutdown #578

rhuffy commented Nov 8, 2024

inespot Nov 9, 2024

rhuffy Nov 12, 2024

inespot Nov 12, 2024

rhuffy Nov 12, 2024

inespot left a comment

Time the runtime of Drain and DrainOnShutdown #578

Time the runtime of Drain and DrainOnShutdown #578

Conversation

rhuffy commented Nov 8, 2024

inespot Nov 9, 2024

Choose a reason for hiding this comment

rhuffy Nov 12, 2024

Choose a reason for hiding this comment

inespot Nov 12, 2024

Choose a reason for hiding this comment

rhuffy Nov 12, 2024

Choose a reason for hiding this comment

inespot left a comment

Choose a reason for hiding this comment