Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add throughput distribution and coarse datapoints for StressWorkerBench #18149

Merged
merged 12 commits into from
Sep 18, 2023

Conversation

twalluxio
Copy link
Contributor

@twalluxio twalluxio commented Sep 14, 2023

For random reads, bytes read per file is not a constant any more. In spite of existing duration distribution, need a throughput distribution for better understanding of reading performance.

Also, when duration too long, grpc will receive huge size of output data. Should aggregate data points to transfer more datapoints with limited output size.

Group datapoints by threads and time slices:
Example:

nodeResults: {
  worker-0: {
    dataPoints: [
      data: [
        { // worker 0, thread 0, slice 0
          count: 1,
          iobytes: 33554432,
        }, { // worker 0, thread 0, other slices
          …
        }
      ], [ // worker 0, other workers
        …
      ]
    ]
    throughputPercentiles: […]
  },
  worker-1: { // other workers
    …
  }
}

Slice time with --slice-size flag, e.g. --slice-size 1s.

Copy link
Contributor

@jiacheliu3 jiacheliu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the work! I left some comments :)

Copy link
Contributor

@jiacheliu3 jiacheliu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a few more comments. I think it's good to merge after our comments are resolved. Thanks for the work!

WorkerBenchCoarseDataPoint dp = new WorkerBenchCoarseDataPoint(
Long.parseLong(workerID),
Thread.currentThread().getId(),
new ArrayList<>(), new ArrayList<>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new ArrayList<>(), new ArrayList<>());
new ArrayList<>(), new ArrayList<>());

are you going to use these lists and add to them? It's better if you just init empty lists internally.

public WorkerBenchCoarseDataPoint(long, long) {
  xxx = new ArrayList<>();
  xxx = new ArrayList<>();
}

Kinda weird if you create a list, and give it to someone, you and that someone both hold the list and can add to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree. There are two places creating a WorkerBenchCoarseDataPoint. One in the bench (without data and throughput arrays), and another in the deserializer (with data and throughput arrays). I will write two different constructors for them.

if (startMs > 0) {
if (bytesRead > 0) {
mResult.setIOBytes(mResult.getIOBytes() + bytesRead);
throughputList.add(bytesRead / duration);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let's change it to ns-based in another PR. The problem with ms is the error on the clock. If an operation is longer than 1ms then you may still observe the same ms value in some extreme cases. I don't really care what the smallest unit is, I just don't want DividedByZeroException.

Copy link
Contributor

@jiacheliu3 jiacheliu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@jiacheliu3 jiacheliu3 added the type-feature This issue is a feature request label Sep 18, 2023
@jiacheliu3
Copy link
Contributor

alluxio-bot, merge this please

@alluxio-bot alluxio-bot merged commit ddba020 into Alluxio:main Sep 18, 2023
12 checks passed
@twalluxio twalluxio deleted the swb-coarse-datapoints branch November 8, 2023 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature This issue is a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants