Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11367. Improve ozone balancing status command output #7139

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

juncevich
Copy link
Contributor

What changes were proposed in this pull request?

Minor improvements to command output:

  1. job task time consumption
  2. iteration time consumption
  3. unfinished iteration shouldn't be in iteration history section
  4. show data units, even it less than gb show it in mb

What is the link to the Apache JIRA

HDDS-11367

How was this patch tested?

Tested by robot test

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juncevich can you please elaborate on what these points mean?

job task time consumption
iteration time consumption

It will help reviewers understand what you're proposing.

@juncevich
Copy link
Contributor Author

@juncevich can you please elaborate on what these points mean?

job task time consumption
iteration time consumption

It will help reviewers understand what you're proposing.

Hi, @siddhantsangwan. Sure, i can explain. Balancing job constists from some iterations. Job task time - the whole balancing time, iteration time - particular iteration time. Job time = iteration time + iteration time ...

@juncevich
Copy link
Contributor Author

Hi siddhantsangwan. Can you review my PR?

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juncevich thanks for working on this. Other than my comments below, we definitely need more testing here. I'd like to at least see:

  1. Unit tests for balancer that verify the iteration duration and other properties are correct.
  2. Tests for the balancer status command that verify the code is correctly picking up the mocked values (mocked response from SCM).

We can also add verifications to the robot tests.

"Scheduled to move containers", containerMovesScheduled,
"Already moved containers", containerMovesCompleted,
"Failed to move containers", containerMovesFailed,
"Failed to move containers by timeout", containerMovesTimeout,
"Entered data to nodes", enteringDataNodeList,
"Exited data from nodes", leavingDataNodeList);
}

private String getPrettyIterationStatusInfo(long duration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably existing utility methods to do this as well. See if you can find any? It's better to use them since they will be well tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find utility methods. I will extract this method to separate util class and write test for it.

"%-50s %n%s" +
"%-50s %n%s",
"Key", "Value",
"Iteration number", iterationNumber,
"Iteration number", iterationNumber == 0 ? "-" : iterationNumber,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what reason are we inserting string constants using String.format? It doesn't feel right quite to be honest. Can we refactor this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why it's bad and how I should refactor it.

@adoroszlai
Copy link
Contributor

@juncevich can you please check test timeouts / failures?

org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerStatusInfo
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerTask

https://github.com/juncevich/ozone/actions/runs/11380961580/artifacts/2067990610

@juncevich
Copy link
Contributor Author

@juncevich can you please check test timeouts / failures?

org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerStatusInfo
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerTask

https://github.com/juncevich/ozone/actions/runs/11380961580/artifacts/2067990610

Thanks for the comment. Tests fixed.

@adoroszlai
Copy link
Contributor

@siddhantsangwan can you please take another look?

@siddhantsangwan
Copy link
Contributor

@juncevich Is this ready for another round of reviews?

@juncevich
Copy link
Contributor Author

@juncevich Is this ready for another round of reviews?

Yes, I try to fix the review notices ASAP because they are blocking another fix. I didn’t bring it up because it’s a different task

@@ -118,7 +121,7 @@ public class ContainerBalancerTask implements Runnable {
private int nextIterationIndex;
private boolean delayStart;
private List<ContainerBalancerTaskIterationStatusInfo> iterationsStatistic;

private OffsetDateTime currentIterationStarted;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst it maybe updated in one thread read access can be done from multiple other threads. This variable should be Atomic.

@juncevich
Copy link
Contributor Author

@ivanzlenko No need to use Atomic for OffsetDateTime, because it thread-safe by design.
image

private final Map<UUID, Long> sizeEnteringNodesGB;
private final Map<UUID, Long> sizeLeavingNodesGB;
private final Map<UUID, Long> sizeEnteringNodes;
private final Map<UUID, Long> sizeLeavingNodes;

@SuppressWarnings("checkstyle:ParameterNumber")
public ContainerBalancerTaskIterationStatusInfo(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is NOT okay to follow bad practices. Please fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants