POC for CSU Utilization Metrics (WIP) #1

lct45 · 2021-06-07T23:12:01Z

Description

What behavior do you want to change, why, how does your patch achieve the changes?

Testing done

Describe the testing strategy. Unit and integration tests are expected for any behavior changes.

Reviewer checklist

Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
Ensure relevant issues are linked (description should include text like "Fixes #")

rodesai · 2021-06-08T07:55:17Z

ksqldb-engine/src/main/java/io/confluent/ksql/internal/UtilizationMetricsListener.java

+        final long windowStart = (long) Math.max(0, windowEnd - windowSize);
+        for (KafkaStreams stream : kafkaStreams) {
+            for (ThreadMetadata thread : stream.localThreadsMetadata()) {
+                blockedTime = Math.min(getProcessingRatio(thread.threadName(), stream, windowStart, windowSize), windowSize);


we want to take the minimum blocked time across all the threads - do you mean min(getProcessingRatio(...), blockedTime) here?

Ahh yup, got mixed up when I renamed variables

rodesai · 2021-06-08T08:01:43Z

ksqldb-engine/src/main/java/io/confluent/ksql/internal/UtilizationMetricsListener.java

+
+    @Override
+    public void run() {
+        logger.info("Reporting CSU thread level metrics");


You'll probably get a more accurate measurement if you actually store the observed sample times rather than assuming the windowSize each time. So something like:

public void run() { while (true) { logger.info("the current processing ratio is " + processingRatio() + "%"); Thread.sleep(windowSize); } } public double processingRatio() { long sampleTime = time.milliseconds(); double blockedTime = sampleTime - lastSampleTime; long windowStart = lastSampleTime; long windowSize = sampleTime - lastSampleTime; ... lastSampleTime = sampleTime; }

This way if the interval between runs is a little longer than you requested (which is more likely under heavy load) your computation won't be off.

So we compute the processing ratio at the start of the window, and then sleep for the full window size, at which point the thread should call run() again, right? Won't those overlap in the ideal case (when everything is running on-time)?

I get storing sample time instead of assuming window size but I don't get the Thread.sleep(windowSize);

sorry - I wrote this comment before I saw you were using a ScheduledExecutorService. I assumed you were just starting a thread. You should just be able to do:

public void run() { logger.info("the current processing ratio is " + processingRatio() + "%"); }

My main point was about using the observed difference between sample times rather than the requested difference (the interval we pass to the executor service).

rodesai · 2021-06-08T08:05:44Z

ksqldb-engine/src/main/java/io/confluent/ksql/query/QueryRegistryImpl.java

@@ -284,8 +285,11 @@ private void registerQuery(
      }
    }
    allLiveQueries.add(query);
+    // For the CSU metrics we need to have initialized first. It seems like initialize could throw errors though
+    // so we probably want to notify the other listeners first


if you rebase on top of confluentinc#7627 (once its merged) you wont need this

rodesai · 2021-06-08T08:07:11Z

ksqldb-engine/src/main/java/io/confluent/ksql/internal/UtilizationMetricsListener.java

+
+    @Override
+    public void onDeregister(final QueryMetadata query) {
+        // Question - if we terminate a query and then restart it, will the underling


we would not reuse the name on a terminate (it gets a new ID), so we should be good here. We do reuse the name on an upgrade. I think we should always just clear up all our samples for the query here. It's not a big deal if we miss a window because a user upgraded their query.

rodesai · 2021-06-10T20:54:26Z

ksqldb-engine/src/main/java/io/confluent/ksql/internal/UtilizationMetricsListener.java

+        long sampleTime = time.milliseconds();
+        double blockedTime = sampleTime - lastSampleTime;
+
+        final long windowSize = Math.max((sampleTime - lastSampleTime), this.windowSize);


why are you taking a max with the windowSize here? If it's to catch the case where sampleTime < lastSampleTime it would be more accurate to just return 0 or throw - something has definitely gone wrong and we shouldn't just compute a value that uses the requested windowSize.

That was something I mostly did for testing - to make sure we either getting a window size that's bigger than the set window size because the thread is slow or we're getting the window size, but nothing smaller. I removed this when I did the test clean up tho

rodesai · 2021-06-11T01:44:26Z

ksqldb-engine/src/main/java/io/confluent/ksql/internal/UtilizationMetricsListener.java

+    @Override
+    public void onDeregister(final QueryMetadata query) {
+        kafkaStreams.remove(query.getKafkaStreams());
+        previousPollTime.remove("poll-time-total");


the keys being removed here should be thread ids right?

Ah yeah, my bad. Did that without fully thinking. Just fixed

BREAKING CHANGE: Existing queries that relied on vague implicit casting will not be started after an upgrade, and new queries that rely on vague implicit casting will be rejected. For example, foo(INT, INT) will not be able to resolve against two underlying function signatures of foo(BIGINT, BIGINT) and foo(DOUBLE, DOUBLE). Calling a function whose only parameter is variadic with an explicit null will also result in the call being rejected as vague.

* feat: implement comparisons for TIME/DATE * rename some stuff * add compareutil test, reject time/timestamp comparisons * checkstyle

* test: add DATE/TIME to connect integration test * rename test * update rest api mapper * checkstyle * checkstyle * disable classdataabstractioncouplinc

…h same schema (confluentinc#7695)

lct45 force-pushed the csu_metrics branch from 3c40fa4 to 3ddceae Compare June 7, 2021 23:14

ConfluentJenkins added 2 commits June 8, 2021 02:11

Set Confluent to 5.3.5, Kafka to 5.3.5.

de40ec8

Set Confluent to 5.5.5, Kafka to 5.5.5.

ddc77bc

rodesai reviewed Jun 8, 2021

View reviewed changes

Set Confluent to 6.0.3, Kafka to 6.0.3.

c1939d8

rodesai reviewed Jun 10, 2021

View reviewed changes

Set Confluent to 6.1.2, Kafka to 6.1.2.

cec29a6

rodesai reviewed Jun 11, 2021

View reviewed changes

ConfluentJenkins added 22 commits June 17, 2021 19:45

Set Confluent to 5.2.5, Kafka to 2.2.2-cp5.

21b69aa

Merge branch '5.3.5-post' into 5.4.0-post

1278425

Merge branch '5.4.0-post' into 5.4.1-post

c539b1f

Merge branch '5.4.1-post' into 5.4.2-post

60dd9b1

Merge branch '5.4.2-post' into 5.4.3-post

ce62b0b

Merge branch '5.4.3-post' into 5.4.4-post

a310a13

Merge branch '5.4.4-post' into 5.5.0-post

5239d8a

Merge branch '5.5.0-post' into 5.5.1-post

2bd3b2a

Merge branch '5.5.1-post' into 5.5.2-post

249a388

Merge branch '5.5.2-post' into 5.5.3-post

c9f6ead

Merge branch '5.5.3-post' into 5.5.4-post

bb74bfa

Merge branch '5.5.4-post' into 6.0.0-post

a5b4a34

Merge branch '6.0.0-post' into 6.0.1-post

56918f9

Merge branch '6.0.1-post' into 6.0.2-post

b5124ce

Merge branch '6.0.2-post' into 6.1.0-post

c967efb

Merge branch '6.1.0-post' into 6.1.1-post

8fb6138

Merge branch '6.1.1-post' into 6.2.0-post

0371906

Merge branch '5.3.5-post' into 5.3.x

6d82976

Merge branch '5.4.4-post' into 5.4.x

6d19f8f

Merge branch '5.5.4-post' into 5.5.x

2823483

Merge branch '6.0.2-post' into 6.0.x

2f030bc

Merge branch '6.1.1-post' into 6.1.x

59cad67

ConfluentJenkins added 8 commits June 26, 2021 17:58

Merge branch '5.5.x' into 6.0.x

13b0b22

Merge branch '6.0.x' into 6.1.x

cda1642

Merge branch '6.1.x' into 6.2.x

04bc252

Merge branch '6.2.x'

5c8c073

Bump Confluent to 6.0.4-SNAPSHOT, Kafka to 6.0.4-SNAPSHOT

c6302e0

Merge branch '6.0.x' into 6.1.x

4081982

Merge branch '6.1.x' into 6.2.x

2b7c6f1

Merge branch '6.2.x'

e78a83b

lct45 force-pushed the csu_metrics branch 2 times, most recently from 23b5825 to 70d873a Compare June 26, 2021 21:13

Sullivan-Patrick and others added 18 commits June 28, 2021 09:09

feat: implement comparisons for TIME/DATE (confluentinc#7734)

78b9ae8

* feat: implement comparisons for TIME/DATE * rename some stuff * add compareutil test, reject time/timestamp comparisons * checkstyle

test: add DATE/TIME to connect integration test (confluentinc#7732)

1144965

* test: add DATE/TIME to connect integration test * rename test * update rest api mapper * checkstyle * checkstyle * disable classdataabstractioncouplinc

chore: update QTT framework to support multiple key/value loggers wit…

6e4e727

…h same schema (confluentinc#7695)

initial csu metric impl

81533cb

adding processing ratio

a1c509e

vague idea for system level cpu utilization metric

e507376

cleanup and comments

95ed55f

trying to make it work

74c2213

updates'

9650851

cleanup

d603b38

review updates

198f0f8

testing changes

4806a77

unit testing for metrics calculation

949b5aa

testing cleanup

25eff03

update remove query

21e6ec7

finer grained metrics

968abfe

removing poll and send in favor of internal client metrics

df915b7

lct45 force-pushed the csu_metrics branch from 70d873a to df915b7 Compare June 29, 2021 16:25

lct45 closed this Jun 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC for CSU Utilization Metrics (WIP) #1

POC for CSU Utilization Metrics (WIP) #1

lct45 commented Jun 7, 2021

rodesai Jun 8, 2021

lct45 Jun 8, 2021

rodesai Jun 8, 2021

lct45 Jun 8, 2021

rodesai Jun 10, 2021 •

edited

Loading

rodesai Jun 8, 2021

rodesai Jun 8, 2021

rodesai Jun 10, 2021

lct45 Jun 10, 2021

rodesai Jun 11, 2021

lct45 Jun 14, 2021

POC for CSU Utilization Metrics (WIP) #1

POC for CSU Utilization Metrics (WIP) #1

Conversation

lct45 commented Jun 7, 2021

Description

Testing done

Reviewer checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rodesai Jun 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rodesai Jun 10, 2021 •

edited

Loading