Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with ListObjectsV2 #55

Closed
mzur opened this issue Dec 5, 2023 · 7 comments
Closed

Issues with ListObjectsV2 #55

mzur opened this issue Dec 5, 2023 · 7 comments

Comments

@mzur
Copy link

mzur commented Dec 5, 2023

I see a different behavior when interacting with the ListObjectsV2 endpoint of AOS than a regular S3 service. Usually, I call ListObjectsV2 to show the "directories" and files with a certain prefix, while having / as delimiter, so only the objects at the current "directory level" are shown. With a regular S3 endpoint, I get the current "directories" as CommonPrefixes and the curent objects as Contents. With AOS there are two issues:

  • I get the current "directories" as Contents as well, even if they are no object. This will list the directories twice in the output of my S3 SDK.

  • The "directories" in CommonPrefixes are not terminated by a slash. As described in the docs, the prefixes should end with a slash.

So to fix this, I would expect the ListObjectsV2 endpoint to only return objects as Contents and also provide CommonPrefixes terminated with a slash.

Here is an example query with the PHP S3 SDK:

$options = ['Bucket' => 'biigletest', 'Prefix' => '', 'Delimiter' => '/'];
$paginator = $client->getPaginator('ListObjectsV2', $options);
$result = $paginator->current();
$result->get('Contents');
// [
//   [
//     "Key" => "biigletest",
//     "LastModified" => Aws\Api\DateTimeResult @1701703435 {#8474
//       date: 2023-12-04 15:23:55.0 +00:00,
//     },
//     "ETag" => "01HGTPVH6VGAC4ADJQEWHRMP50",
//     "Size" => "0",
//   ],
// ]
$result->get('CommonPrefixes');
// [
//   [
//     "Prefix" => "biigletest",
//   ],
// ]
@St4NNi
Copy link
Member

St4NNi commented Dec 5, 2023

Thanks for reporting, yes this is a bug. Regarding the trailing slashes this should be an easy fix, regarding the "duplication" of hierarchy resources I am actually not sure how we should handle this. Without this duplication it would hard or almost impossible to query the ID of a hierarchy resource.

This ID can be used to download a .tar.gz bundle of the whole downwards resource tree via the objects special bucket.

@mzur
Copy link
Author

mzur commented Dec 6, 2023

This ID can be used to download a .tar.gz bundle of the whole downwards resource tree via the objects special bucket.

So this can be done with S3? Maybe it's also fine to only offer this feature via the regular API to keep the S3 API compliant with the specs. Or only show this behavior with the objects bucket.

@St4NNi
Copy link
Member

St4NNi commented Dec 6, 2023

Yes, sub-trees can be bundled into .tar.gz archives. But thinking about it i agree that it is not the best user-experience to repeat these entries in ListObjectV2. So I will put this on the todo list for future updates.

Unfortunately the spec doesn't really help here because hierarchy objects have a different meaning for us which goes way farther then the specs perspective of bucket and keys that are only strings with some arbitrary separators.

@mzur
Copy link
Author

mzur commented Dec 6, 2023

Ok, so from my perspective it would be ok if I had to speak with the regular API if I want to download an archive of a sub-tree. With S3, I can easily download a sub-tree in the usual way (maybe even faster because the load can be distributed). But that's only my opinion.

So right now if I want to support AOSv2 in my BIIGLE service, I still have to handle object listing in a special way via the regular API (just as with AOSv1 without ListObjectV2). So I'd very much appreciate if this would be changed/fixed 🙂

das-Abroxas added a commit that referenced this issue Dec 8, 2023
@das-Abroxas
Copy link
Contributor

das-Abroxas commented Dec 8, 2023

We have fixed the ListObjectsV2 implementation (2719f53), which should already be available in the dev instance.

This means that a correct distinction is now made between Contents and CommonPrefixes, depending on the delimiter. The CommonPrefixes also now end correctly with a slash (or the delimiter specified in the request). The only "duplicates" that should now still appear in a response are Objects that can be accessed via multiple hierarchies in a Project, i.e. that exist in multiple Collections and/or Datasets.

It would be great if you could test the functionality again and give us feedback.

Have a nice weekend ✌️

@mzur
Copy link
Author

mzur commented Dec 8, 2023

Works perfectly now, thanks!

The next thing I would like to do is to set CORS rules for the bucket. s3cmd tells me that PutBucketCors is not implemented in the data proxy. It's also not mentioned here #19 (comment). Maybe you could put that on your roadmap, too (unless there is a way via the web UI or regular API)?

@St4NNi
Copy link
Member

St4NNi commented Dec 8, 2023

Nice, your welcome !

We had a CORS implementation in V1 and need to port this to V2, I have updated the corresponding issue #29 .
Will close this issue for now, if you have any problems regarding ListObjectV2 feel free to re-open it any time.

@St4NNi St4NNi closed this as completed Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants