Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time the runtime of Drain and DrainOnShutdown #578

Merged
merged 2 commits into from
Nov 12, 2024

Conversation

rhuffy
Copy link
Contributor

@rhuffy rhuffy commented Nov 8, 2024

These runtimes will be useful to determine reasonable settings for K8s termination grace period.

LogQL can be used to treat these logs lines as metrics for aggregating these values across many clusters.

@@ -754,6 +757,8 @@ public void runMayThrow() throws InterruptedException
ScheduledExecutors.nonPeriodicTasks.shutdown();
if (!ScheduledExecutors.nonPeriodicTasks.awaitTermination(1, MINUTES))
logger.warn("Miscellaneous task executor still busy after one minute; proceeding with shutdown");

logger.info("DrainOnShutdown completed in {} ms", SafeArg.of("ms", watch.elapsed(TimeUnit.MILLISECONDS)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be simpler to just add metrics? Or is there any advantages to using the logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we poll the metrics endpoints, it's likely that a metric reported just before shutdown would be missed.

LogQL supports querying logs as if they were metrics, so there is no benefit to using metrics here over a log.

Metrics are also a bit more complicated to set up, and are better suited for values that should be continuously sampled, instead of just an infrequently fired event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add logs at the start of each function (so we understand if a SIGKILL was sent before the thread completed) flush each of the logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a log to drainOnShutdown

the setMode calls i.e:setMode(Mode.DRAINING, "starting drain process", true); will emit a log

    void setMode(Mode m, @Safe String msg, boolean log)
    {
        operationMode = m;
        if (log)
            logger.info(m.toString(), SafeArg.of("msg", msg));
        else
            logger.debug(m.toString(), SafeArg.of("msg", msg));
    }

Copy link
Contributor

@inespot inespot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From offline conv: as a FLUP we need to properly shutdown the logger in sls in order to make sure those logs do not get dropped when Cassandra shuts down gracefully

@rhuffy rhuffy merged commit eea1112 into palantir-cassandra-2.2.18 Nov 12, 2024
6 checks passed
@rhuffy rhuffy deleted the rh/time-drain branch November 13, 2024 04:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants