Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume #6535

Merged
merged 8 commits into from
Jun 10, 2024

Conversation

smitajoshi12
Copy link
Contributor

@smitajoshi12 smitajoshi12 commented Apr 16, 2024

What changes were proposed in this pull request?

When the number of keys/volume/bucket are huge, the current disk usage UI doesnt make much sense.
This pull request introduces enhancements to the Recon disk usage endpoint to significantly improve usability and performance when dealing with large datasets:
Top Entities Focus: The endpoint has been updated to efficiently sort and display only the top entities by size. This targeted approach helps users easily identify the most significant space consumers, addressing the impracticality of visualizing thousands of records in a single view.
Efficient Sorting with Parallel Streams: To manage and sort vast numbers of records effectively, we've implemented parallel stream processing.

Key advantages of using parallel streams include :-
Better Utilization of Multi-core Processors: Enables concurrent sorting operations across multiple cores, drastically cutting down processing times for large datasets.
Optimized for Large Datasets: The parallelism overhead is more efficiently distributed over a large number of elements, making it particularly suited for our use case.

Backend PR For Reference:-
#6318

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9626

How was this patch tested?

Manually
Before this PR
image

After this PR
Tested with Cluster Data
image

image

image

@smitajoshi12 smitajoshi12 changed the title [Recon] Disk Usage page with high number of key/bucket/volume HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume Apr 16, 2024
@swamirishi
Copy link
Contributor

@dombizita @devmadhuu Can you please take a look at this patch.

Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @smitajoshi12
While testing this patch locally I noticed a few discrepancies while setting the Display Limit :-

  • I currently have 56 keys in my cluster all of which are present inside the buckettest.

1. When I set the display limit to 5 I notice that 5 objects of the highest size is displayed and also the remaining objects are clubbed inside the Other Objects

image

2. For 20 I get the correct result as well :-

image

3. But when I set the limit to 30 I do not see the Other Objects slot anywhere even though there are a total of 56 keys hence the remaining 26 Keys need to get clubbed into Other Objects.

image

@smitajoshi12
Copy link
Contributor Author

smitajoshi12 commented Apr 29, 2024

Thanks for working on this @smitajoshi12 While testing this patch locally I noticed a few discrepancies while setting the Display Limit :-

  • I currently have 56 keys in my cluster all of which are present inside the buckettest.

1. When I set the display limit to 5 I notice that 5 objects of the highest size is displayed and also the remaining objects are clubbed inside the Other Objects

image

2. For 20 I get the correct result as well :-

image

3. But when I set the limit to 30 I do not see the Other Objects slot anywhere even though there are a total of 56 keys hence the remaining 26 Keys need to get clubbed into Other Objects.

image

@ArafatKhan2198
Corrected in next commit.

image

Copy link
Contributor

@dombizita dombizita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @smitajoshi12. To use the improvements in the namespace endpoint that @ArafatKhan2198 introduced in #6318, you need to change the endpoint that you call here:

const duEndpoint = `/api/v1/namespace/du?path=${path}&files=true`;

The sortSubPaths needs to be set true.
@DefaultValue("true") @QueryParam("sortSubPaths") boolean sortSubpaths)

@smitajoshi12
Copy link
Contributor Author

Thanks for working on this @smitajoshi12. To use the improvements in the namespace endpoint that @ArafatKhan2198 introduced in #6318, you need to change the endpoint that you call here:

const duEndpoint = `/api/v1/namespace/du?path=${path}&files=true`;

The sortSubPaths needs to be set true.

@DefaultValue("true") @QueryParam("sortSubPaths") boolean sortSubpaths)

@dombizita @ArafatKhan2198
Addresed above comments in latest commit and updated screenshots with cluster data.

Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the patch, @smitajoshi12. We are now using the correct API parameters for sorting the subpaths, but there is still an issue from the UI perspective. Let's say we have three files:

file1 -> Size -> 1 KB
file2 -> Size -> 10 KB
file3 -> Size -> 1 GB

The API endpoint would return a response in descending order of size. However, the problem is that the UI representation becomes skewed, as shown in the image below:
Here, we have three directories with sizes 1 KB, 10 KB, and 1 GB. I believe the size of each part of the pie chart is relative to the file size, but this creates a poor user experience. We need to address this issue to improve the user interface.

image

Could you please take care of this!

@smitajoshi12
Copy link
Contributor Author

smitajoshi12 commented May 15, 2024

Thanks for updating the patch, @smitajoshi12. We are now using the correct API parameters for sorting the subpaths, but there is still an issue from the UI perspective. Let's say we have three files:

file1 -> Size -> 1 KB
file2 -> Size -> 10 KB
file3 -> Size -> 1 GB

The API endpoint would return a response in descending order of size. However, the problem is that the UI representation becomes skewed, as shown in the image below: Here, we have three directories with sizes 1 KB, 10 KB, and 1 GB. I believe the size of each part of the pie chart is relative to the file size, but this creates a poor user experience. We need to address this issue to improve the user interface.

image Could you please take care of this!

@ArafatKhan2198
Can we raise seprate JIRA for it as it is known issue need to work on lots of changes. As we used Normalization in Heatmap also. Raised Seprate JIRA[ https://issues.apache.org/jira/browse/HDDS-10864 ]

Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @smitajoshi12 for working on this.
LGTM

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @smitajoshi12 for working on this. LGTM +1

Copy link
Contributor

@dombizita dombizita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating your patch @smitajoshi12! Please take a look at my comments!

@devmadhuu
Copy link
Contributor

Thanks @smitajoshi12 for working on this patch. Thanks @dombizita , @ArafatKhan2198 for reviewing the patch.

@devmadhuu devmadhuu merged commit 925cc08 into apache:master Jun 10, 2024
33 checks passed
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jun 17, 2024
@smitajoshi12 smitajoshi12 deleted the HDDS-9626 branch July 25, 2024 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants