Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10634. Recon - listKeys API for listing of OBS , FSO and Legacy bucket keys with filters. #6503

Closed
wants to merge 12 commits into from

Conversation

devmadhuu
Copy link
Contributor

@devmadhuu devmadhuu commented Apr 9, 2024

What changes were proposed in this pull request?

This PR adds a new API in Recon for listing keys for OBS buckets, Legacy buckets with filters and recursively in a flat structure for FSO buckets.

New API:

api/v1/namespace/listKeys?startPrefix=/volume1/obs-bucket/&count=105

Default values of API parameters if not provided:

1. replicationType - empty string and filter will not be applied, so list out all keys irrespective of replication type.
2. creationTime - empty string and filter will not be applied, so list out keys irrespective of age, else list out keys which got created on or after provided creationTime
3. keySize - 0 bytes, which means all keys greater than zero bytes will be listed, effectively all.
4. startPrefix - /
5. count - 1000

Behavior of API:
For OBS bucket - list out count number of keys on the provided path.
This API will implement pagination support using count params.

Get List of All Keys:
GET /api/v1/namespace/listKeys

 API params:
  1. replicationType - Filter for RATIS or EC replication keys
  2. creationDate in "MM-dd-yyyy HH:mm:ss" string format.
  3. startPrefix
  4. count
  5. keySize
  6. recursive - recursive listing out keys recursively for FSO buckets.

 **Input Request for OBS bucket:**

       `api/v1/namespace/listKeys?startPrefix=/volume1/obs-bucket&count=105`

 **Output Response:**

  ```
    {
        "status": "OK",
        "path": "/volume1/obs-bucket",
        "size": 73400320,
        "sizeWithReplica": 81788928,
        "subPathCount": 7,
        "totalKeyCount": 7,
        "lastKey": "/volume1/obs-bucket/key7",
        "subPaths": [
            {
                "key": true,
                "path": "key1",
                "size": 10485760,
                "sizeWithReplica": 10485760,
                "isKey": true,
                "replicationType": "RATIS",
                "creationTime": 1712680854675,
                "modificationTime": 1712680855695
            },
            {
                "key": true,
                "path": "key1/key2",
                "size": 10485760,
                "sizeWithReplica": 10485760,
                "isKey": true,
                "replicationType": "RATIS",
                "creationTime": 1712680857753,
                "modificationTime": 1712680858666
            },
            {
                "key": true,
                "path": "key1/key2/key3",
                "size": 10485760,
                "sizeWithReplica": 10485760,
                "isKey": true,
                "replicationType": "RATIS",
                "creationTime": 1712680860801,
                "modificationTime": 1712680861870
            },
            {
                "key": true,
                "path": "key4",
                "size": 10485760,
                "sizeWithReplica": 10485760,
                "isKey": true,
                "replicationType": "RATIS",
                "creationTime": 1712680863937,
                "modificationTime": 1712680864899
            },
            {
                "key": true,
                "path": "key5",
                "size": 10485760,
                "sizeWithReplica": 10485760,
                "isKey": true,
                "replicationType": "RATIS",
                "creationTime": 1712680866996,
                "modificationTime": 1712680868187
            },
            {
                "key": true,
                "path": "key6",
                "size": 10485760,
                "sizeWithReplica": 10485760,
                "isKey": true,
                "replicationType": "RATIS",
                "creationTime": 1712680870182,
                "modificationTime": 1712680871044
            },
            {
                 "key": true,
                 "path": "key7",
                 "size": 10485760,
                 "sizeWithReplica": 18874368,
                 "isKey": true,
                 "replicationType": "EC",
                 "creationTime": 1713262187049,
                 "modificationTime": 1713262188135
           }
        ],
        "sizeDirectKey": 73400320
    }

**Input Request for FSO bucket:**

           `api/v1/namespace/listKeys?startPrefix=/volume1/fso-bucket&recursive=true`

**Output Response:**

      ```
      {
                "status": "OK",
                "path": "/volume1/fso-bucket",
                "size": 62914560,
                "sizeWithReplica": 188743680,
                "subPathCount": 6,
                "totalKeyCount": 6,
                "lastKey": "/-9223372036854775552/-9223372036854775040/-9223372036854774525/testfile",
                "subPaths": [
                    {
                        "key": true,
                        "path": "volume1/fso-bucket/dir1/file1",
                        "size": 10485760,
                        "sizeWithReplica": 31457280,
                        "isKey": true,
                        "replicationType": "RATIS",
                        "creationTime": 1712680835581,
                        "modificationTime": 1712680836508
                    },
                    {
                        "key": true,
                        "path": "volume1/fso-bucket/dir1/testfile",
                        "size": 10485760,
                        "sizeWithReplica": 31457280,
                        "isKey": true,
                        "replicationType": "RATIS",
                        "creationTime": 1712680832118,
                        "modificationTime": 1712680833528
                    },
                    {
                        "key": true,
                        "path": "volume1/fso-bucket/dir1/dir2/file1",
                        "size": 10485760,
                        "sizeWithReplica": 31457280,
                        "isKey": true,
                        "replicationType": "RATIS",
                        "creationTime": 1712680841158,
                        "modificationTime": 1712680842040
                    },
                    {
                        "key": true,
                        "path": "volume1/fso-bucket/dir1/dir2/testfile",
                        "size": 10485760,
                        "sizeWithReplica": 31457280,
                        "isKey": true,
                        "replicationType": "RATIS",
                        "creationTime": 1712680838434,
                        "modificationTime": 1712680839254
                    },
                    {
                        "key": true,
                        "path": "volume1/fso-bucket/dir1/dir2/dir3/file1",
                        "size": 10485760,
                        "sizeWithReplica": 31457280,
                        "isKey": true,
                        "replicationType": "RATIS",
                        "creationTime": 1712680847287,
                        "modificationTime": 1712680850660
                    },
                    {
                        "key": true,
                        "path": "volume1/fso-bucket/dir1/dir2/dir3/testfile",
                        "size": 10485760,
                        "sizeWithReplica": 31457280,
                        "isKey": true,
                        "replicationType": "RATIS",
                        "creationTime": 1712680843959,
                        "modificationTime": 1712680844890
                    }
                ],
                "sizeDirectKey": 0
            }

Input Request for LEGACY bucket:

       `api/v1/namespace/listKeys?startPrefix=/volume1/legacy-bucket

Output Response:

        {
            "status": "OK",
            "path": "/volume1/legacy-bucket",
            "size": 157286400,
            "sizeWithReplica": 157286400,
            "subPathCount": 6,
            "totalKeyCount": 6,
            "lastKey": "/volume1/legacy-bucket/key6",
            "subPaths": [
                {
                    "key": true,
                    "path": "key1",
                    "size": 10485760,
                    "sizeWithReplica": 10485760,
                    "isKey": true,
                    "replicationType": "RATIS",
                    "creationTime": 1712680878239,
                    "modificationTime": 1712680879179
                },
                {
                    "key": true,
                    "path": "key1/key2",
                    "size": 41943040,
                    "sizeWithReplica": 41943040,
                    "isKey": true,
                    "replicationType": "RATIS",
                    "creationTime": 1712680881331,
                    "modificationTime": 1712680882611
                },
                {
                    "key": true,
                    "path": "key1/key2/key3",
                    "size": 10485760,
                    "sizeWithReplica": 10485760,
                    "isKey": true,
                    "replicationType": "RATIS",
                    "creationTime": 1712680884664,
                    "modificationTime": 1712680885522
                },
                {
                    "key": true,
                    "path": "key4",
                    "size": 41943040,
                    "sizeWithReplica": 41943040,
                    "isKey": true,
                    "replicationType": "RATIS",
                    "creationTime": 1712680887558,
                    "modificationTime": 1712680888590
                },
                {
                    "key": true,
                    "path": "key5",
                    "size": 10485760,
                    "sizeWithReplica": 10485760,
                    "isKey": true,
                    "replicationType": "RATIS",
                    "creationTime": 1712680890644,
                    "modificationTime": 1712680891447
                },
                {
                    "key": true,
                    "path": "key6",
                    "size": 41943040,
                    "sizeWithReplica": 41943040,
                    "isKey": true,
                    "replicationType": "RATIS",
                    "creationTime": 1712680907002,
                    "modificationTime": 1712680908210
                }
            ],
            "sizeDirectKey": 157286400
        }

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10634

How was this patch tested?

Added Junit test cases and tested various assertions.

@errose28
Copy link
Contributor

errose28 commented Apr 9, 2024

  1. replicationType - RATIS
  2. creationTime - empty string and filter will not be applied, so list out keys irrespective of age, else list out keys which got created on or after provided creationTime
  3. keySize - 0 bytes, which means all keys greater than zero bytes will be listed, effectively all.
  4. startPrefix - /

I think default replication types should be all of them. This is consistent with the key size and create time filters which have no effect if no value is provided.

  1. count - 1000
    ...
    This API will implement pagination support using count params.
  • How can we make sure the user knows the value is truncated and that there is not only 1000 keys in the prefix?
  • How is pagination implemented to tell the server where the next 1000 keys should start?

@devmadhuu
Copy link
Contributor Author

  1. replicationType - RATIS
  2. creationTime - empty string and filter will not be applied, so list out keys irrespective of age, else list out keys which got created on or after provided creationTime
  3. keySize - 0 bytes, which means all keys greater than zero bytes will be listed, effectively all.
  4. startPrefix - /

I think default replication types should be all of them. This is consistent with the key size and create time filters which have no effect if no value is provided.

  1. count - 1000
    ...
    This API will implement pagination support using count params.
  • How can we make sure the user knows the value is truncated and that there is not only 1000 keys in the prefix?
  • How is pagination implemented to tell the server where the next 1000 keys should start?
  1. Agree, ReplicationType default value can be empty, so listing effectively all.
  2. Thinking of providing total count based on filters provided in response, which tells how many keys in the prefix.
  3. Still not decided the solution, but I was thinking to provide one more param -> offset. So e.g. if client provides offset as 0 and count as 10, then server will give first 10 records. And if client provides offset as 100 and count as 50, then server will skip first 100 records and provides next 50 records.

@errose28
Copy link
Contributor

I think just providing the last key in the list as a start key might be easier to resume pagination from than adding a count based offset.

@devmadhuu devmadhuu marked this pull request as ready for review April 17, 2024 09:43
@devmadhuu
Copy link
Contributor Author

@dombizita @ArafatKhan2198 @sodonnel @sumitagrawl Pls review.

Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the effort on this @devmadhuu.
Left a few comments.

@@ -134,6 +134,7 @@ public class TestNSSummaryEndpointWithLegacy {
private static final String BUCKET_TWO = "bucket2";
private static final String BUCKET_THREE = "bucket3";
private static final String BUCKET_FOUR = "bucket4";
private static final String BUCKET_FIVE = "bucket5";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we asserting anything in the test class TestNSSummaryEndpointWithLegacy? We have created a bunch of new files and directories but have not asserted anything yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we asserting anything in the test class TestNSSummaryEndpointWithLegacy? We have created a bunch of new files and directories but have not asserted anything yet.

I added the test case for both legacy and obs buckets in TestNSSummaryEndpointWithOBSAndLegacy class. Will remove the tree structure from here.

Comment on lines 1051 to 1052
// There are no sub-paths under this LEGACY bucket.
assertEquals(2, duBucketResponse.getCount());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are mentioning that there are no sub paths under this legacy bucket but we are asserting that the subPathCount returned by duBucketResponse.getCount() returns 2
Or does the comment is mentioning something else that I have mistaken?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is not correct. Will update it.

@@ -145,6 +146,12 @@ public class TestNSSummaryEndpointWithLegacy {
private static final String KEY_NINE = "dir5/file9";
private static final String KEY_TEN = "dir5/file10";
private static final String KEY_ELEVEN = "file11";
private static final String KEY_TWELVE = "file12";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also update the tree diagram for TestNSSummaryEndpointWithLegacy with the new changes. I think we could implement a similar diagram style like present in TestNSSummaryEndpointWithOBSAndLegacy to show case the file hierarchy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also update the tree diagram for TestNSSummaryEndpointWithLegacy with the new changes. I think we could implement a similar diagram style like present in TestNSSummaryEndpointWithOBSAndLegacy to show case the file hierarchy.

I added the test case for both legacy and obs buckets in TestNSSummaryEndpointWithOBSAndLegacy class. Will remove the tree structure from here.

* @return The constructed full path of the key as a String.
* @throws IOException
*/
public static String constructFullPath(OmKeyInfo omKeyInfo,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing this PR I noticed that some of the code from PR 6492 is also included in your PR: , such as the introduction of parent ID and construction of the full path. I believe we need these changes for your pull request. We could always rebase it once PR 6492 gets merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, needs to rebase it.

@ArafatKhan2198
Copy link
Contributor

Since this endpoint /listKeys is part of the various NSSummaryEndpoints, can we introduce an integration test for the endpoint as well? We currently do not have one, and in our discussions, we had decided to add one in the future. For now, we can just test out the ListKeys feature. We could test it out for various bucket types, with various hierarchies and also test out the pagination part as well. The other methods part of NSSummaryEndpoint could be included in subsequent jira's later on.

@devmadhuu
Copy link
Contributor Author

Since this endpoint /listKeys is part of the various NSSummaryEndpoints, can we introduce an integration test for the endpoint as well? We currently do not have one, and in our discussions, we had decided to add one in the future. For now, we can just test out the ListKeys feature. We could test it out for various bucket types, with various hierarchies and also test out the pagination part as well. The other methods part of NSSummaryEndpoint could be included in subsequent jira's later on.

Ok, sure will add integration test.

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devmadhuu Thanks for working over this, IMO, we need use different approach for listing files


Stats stats = new Stats(limit);

duResponse = handler.getListKeysResponse(stats, recursive);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of overloading DuReponse class, we can define new response class for listFile purpose and return only required information

if (stats.getCurrentCount() < stats.getLimit()) {
populateDiskUsage(keyInfo, diskUsageList);
stats.setCurrentCount(stats.getCurrentCount() + 1);
stats.setLastKey(kv.getKey());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for list file, LastKey can be one of file, but seems current logic do not support iteration continue from that point, eg:

  • lets list file for a volume /a
  • limit reached for one of file /a/b/c/d/1.txt and this is last key
  • How this will continue listing further keys and other bucket files if given as input ?

Shall we use other listing operation as we use from rocks db iterator ? which is lexicographic ordering ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumitagrawl , thanks for reviewing the patch. Current listKeys API has this limit parameter which will limit in order. If other buckets also present in volume and if limit reached, i will not list. This is similar to what we have in CLI behavior.

@devmadhuu devmadhuu marked this pull request as draft May 5, 2024 03:33
@devmadhuu devmadhuu closed this May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants