HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume #6535

smitajoshi12 · 2024-04-16T09:37:29Z

What changes were proposed in this pull request?

When the number of keys/volume/bucket are huge, the current disk usage UI doesnt make much sense.
This pull request introduces enhancements to the Recon disk usage endpoint to significantly improve usability and performance when dealing with large datasets:
Top Entities Focus: The endpoint has been updated to efficiently sort and display only the top entities by size. This targeted approach helps users easily identify the most significant space consumers, addressing the impracticality of visualizing thousands of records in a single view.
Efficient Sorting with Parallel Streams: To manage and sort vast numbers of records effectively, we've implemented parallel stream processing.

Key advantages of using parallel streams include :-
Better Utilization of Multi-core Processors: Enables concurrent sorting operations across multiple cores, drastically cutting down processing times for large datasets.
Optimized for Large Datasets: The parallelism overhead is more efficiently distributed over a large number of elements, making it particularly suited for our use case.

Backend PR For Reference:-
#6318

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9626

How was this patch tested?

Manually
Before this PR

After this PR
Tested with Cluster Data

swamirishi · 2024-04-22T16:13:32Z

@dombizita @devmadhuu Can you please take a look at this patch.

ArafatKhan2198

Thanks for working on this @smitajoshi12
While testing this patch locally I noticed a few discrepancies while setting the Display Limit :-

I currently have 56 keys in my cluster all of which are present inside the buckettest.

1. When I set the display limit to 5 I notice that 5 objects of the highest size is displayed and also the remaining objects are clubbed inside the `Other Objects`

2. For 20 I get the correct result as well :-

3. But when I set the limit to 30 I do not see the `Other Objects` slot anywhere even though there are a total of 56 keys hence the remaining 26 Keys need to get clubbed into `Other Objects`.

smitajoshi12 · 2024-04-29T12:03:52Z

Thanks for working on this @smitajoshi12 While testing this patch locally I noticed a few discrepancies while setting the Display Limit :-

I currently have 56 keys in my cluster all of which are present inside the buckettest.

1. When I set the display limit to 5 I notice that 5 objects of the highest size is displayed and also the remaining objects are clubbed inside the Other Objects

2. For 20 I get the correct result as well :-

3. But when I set the limit to 30 I do not see the Other Objects slot anywhere even though there are a total of 56 keys hence the remaining 26 Keys need to get clubbed into Other Objects.

@ArafatKhan2198
Corrected in next commit.

dombizita

Thanks for working on this @smitajoshi12. To use the improvements in the namespace endpoint that @ArafatKhan2198 introduced in #6318, you need to change the endpoint that you call here:

ozone/hadoop-ozone/recon/src/main/resources/webapps/recon/ozone-recon-web/src/views/diskUsage/diskUsage.tsx

Line 132 in 1cbee60

const duEndpoint = `/api/v1/namespace/du?path=${path}&files=true`;

The sortSubPaths needs to be set true.

ozone/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/NSSummaryEndpoint.java

Line 115 in 21fa62f

@DefaultValue("true") @QueryParam("sortSubPaths") boolean sortSubpaths)

smitajoshi12 · 2024-05-07T16:24:03Z

Thanks for working on this @smitajoshi12. To use the improvements in the namespace endpoint that @ArafatKhan2198 introduced in #6318, you need to change the endpoint that you call here:

ozone/hadoop-ozone/recon/src/main/resources/webapps/recon/ozone-recon-web/src/views/diskUsage/diskUsage.tsx

Line 132 in 1cbee60

const duEndpoint = `/api/v1/namespace/du?path=${path}&files=true`;

The sortSubPaths needs to be set true.

ozone/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/NSSummaryEndpoint.java

Line 115 in 21fa62f

@DefaultValue("true") @QueryParam("sortSubPaths") boolean sortSubpaths)

@dombizita @ArafatKhan2198
Addresed above comments in latest commit and updated screenshots with cluster data.

ArafatKhan2198

Thanks for updating the patch, @smitajoshi12. We are now using the correct API parameters for sorting the subpaths, but there is still an issue from the UI perspective. Let's say we have three files:

file1 -> Size -> 1 KB
file2 -> Size -> 10 KB
file3 -> Size -> 1 GB

The API endpoint would return a response in descending order of size. However, the problem is that the UI representation becomes skewed, as shown in the image below:
Here, we have three directories with sizes 1 KB, 10 KB, and 1 GB. I believe the size of each part of the pie chart is relative to the file size, but this creates a poor user experience. We need to address this issue to improve the user interface.

Could you please take care of this!

smitajoshi12 · 2024-05-15T05:58:01Z

Thanks for updating the patch, @smitajoshi12. We are now using the correct API parameters for sorting the subpaths, but there is still an issue from the UI perspective. Let's say we have three files:
file1 -> Size -> 1 KB
file2 -> Size -> 10 KB
file3 -> Size -> 1 GB
The API endpoint would return a response in descending order of size. However, the problem is that the UI representation becomes skewed, as shown in the image below: Here, we have three directories with sizes 1 KB, 10 KB, and 1 GB. I believe the size of each part of the pie chart is relative to the file size, but this creates a poor user experience. We need to address this issue to improve the user interface.
Could you please take care of this!

@ArafatKhan2198
Can we raise seprate JIRA for it as it is known issue need to work on lots of changes. As we used Normalization in Heatmap also. Raised Seprate JIRA[ https://issues.apache.org/jira/browse/HDDS-10864 ]

ArafatKhan2198

Thanks, @smitajoshi12 for working on this.
LGTM

devmadhuu

Thanks @smitajoshi12 for working on this. LGTM +1

...one/recon/src/main/resources/webapps/recon/ozone-recon-web/src/views/diskUsage/diskUsage.tsx

…ume Review Comments

dombizita

Thanks for updating your patch @smitajoshi12! Please take a look at my comments!

...one/recon/src/main/resources/webapps/recon/ozone-recon-web/src/views/diskUsage/diskUsage.tsx

…ume Review Changes

…ume Review

…ume Review Comment

devmadhuu · 2024-06-10T13:23:53Z

Thanks @smitajoshi12 for working on this patch. Thanks @dombizita , @ArafatKhan2198 for reviewing the patch.

…ume (apache#6535)

smitajoshi12 changed the title ~~[Recon] Disk Usage page with high number of key/bucket/volume~~ HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume Apr 16, 2024

adoroszlai added recon UI labels Apr 16, 2024

ArafatKhan2198 reviewed Apr 28, 2024

View reviewed changes

smitajoshi12 force-pushed the HDDS-9626 branch from fd1cc0d to 305ff53 Compare April 29, 2024 06:07

dombizita requested changes May 6, 2024

View reviewed changes

smitajoshi12 added 4 commits May 6, 2024 17:58

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume

0059d87

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume

abc2b66

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume

c40c4fb

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume

70a34b2

smitajoshi12 force-pushed the HDDS-9626 branch from 1714a3d to 70a34b2 Compare May 7, 2024 16:21

ArafatKhan2198 requested changes May 14, 2024

View reviewed changes

ArafatKhan2198 approved these changes May 17, 2024

View reviewed changes

smitajoshi12 requested a review from dombizita May 21, 2024 04:00

devmadhuu approved these changes May 21, 2024

View reviewed changes

dombizita requested changes May 21, 2024

View reviewed changes

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/vol…

f348ede

…ume Review Comments

dombizita requested changes May 29, 2024

View reviewed changes

smitajoshi12 added 3 commits May 29, 2024 17:11

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/vol…

f87ed82

…ume Review Changes

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/vol…

6c9f79e

…ume Review

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/vol…

ed6e734

…ume Review Comment

dombizita approved these changes Jun 10, 2024

View reviewed changes

devmadhuu merged commit 925cc08 into apache:master Jun 10, 2024
33 checks passed

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jun 17, 2024

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/vol…

25c9eb1

…ume (apache#6535)

smitajoshi12 deleted the HDDS-9626 branch July 25, 2024 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume #6535

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume #6535

smitajoshi12 commented Apr 16, 2024 •

edited

Loading

swamirishi commented Apr 22, 2024

ArafatKhan2198 left a comment •

edited

Loading

smitajoshi12 commented Apr 29, 2024 •

edited

Loading

1. When I set the display limit to 5 I notice that 5 objects of the highest size is displayed and also the remaining objects are clubbed inside the `Other Objects`

2. For 20 I get the correct result as well :-

3. But when I set the limit to 30 I do not see the `Other Objects` slot anywhere even though there are a total of 56 keys hence the remaining 26 Keys need to get clubbed into `Other Objects`.

dombizita left a comment

smitajoshi12 commented May 7, 2024

ArafatKhan2198 left a comment

smitajoshi12 commented May 15, 2024 •

edited

Loading

ArafatKhan2198 left a comment

devmadhuu left a comment

dombizita left a comment

devmadhuu commented Jun 10, 2024

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume #6535

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume #6535

Conversation

smitajoshi12 commented Apr 16, 2024 • edited Loading

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

swamirishi commented Apr 22, 2024

ArafatKhan2198 left a comment • edited Loading

Choose a reason for hiding this comment

1. When I set the display limit to 5 I notice that 5 objects of the highest size is displayed and also the remaining objects are clubbed inside the Other Objects

2. For 20 I get the correct result as well :-

3. But when I set the limit to 30 I do not see the Other Objects slot anywhere even though there are a total of 56 keys hence the remaining 26 Keys need to get clubbed into Other Objects.

smitajoshi12 commented Apr 29, 2024 • edited Loading

1. When I set the display limit to 5 I notice that 5 objects of the highest size is displayed and also the remaining objects are clubbed inside the Other Objects

2. For 20 I get the correct result as well :-

3. But when I set the limit to 30 I do not see the Other Objects slot anywhere even though there are a total of 56 keys hence the remaining 26 Keys need to get clubbed into Other Objects.

dombizita left a comment

Choose a reason for hiding this comment

smitajoshi12 commented May 7, 2024

ArafatKhan2198 left a comment

Choose a reason for hiding this comment

smitajoshi12 commented May 15, 2024 • edited Loading

ArafatKhan2198 left a comment

Choose a reason for hiding this comment

devmadhuu left a comment

Choose a reason for hiding this comment

dombizita left a comment

Choose a reason for hiding this comment

devmadhuu commented Jun 10, 2024

smitajoshi12 commented Apr 16, 2024 •

edited

Loading

ArafatKhan2198 left a comment •

edited

Loading

1. When I set the display limit to 5 I notice that 5 objects of the highest size is displayed and also the remaining objects are clubbed inside the `Other Objects`

3. But when I set the limit to 30 I do not see the `Other Objects` slot anywhere even though there are a total of 56 keys hence the remaining 26 Keys need to get clubbed into `Other Objects`.

smitajoshi12 commented Apr 29, 2024 •

edited

Loading

1. When I set the display limit to 5 I notice that 5 objects of the highest size is displayed and also the remaining objects are clubbed inside the `Other Objects`

3. But when I set the limit to 30 I do not see the `Other Objects` slot anywhere even though there are a total of 56 keys hence the remaining 26 Keys need to get clubbed into `Other Objects`.

smitajoshi12 commented May 15, 2024 •

edited

Loading