You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To display images with their associated metadata in the dataset viewer, a metadata.csv file is required. In the case of a dataset with multiple subsets, this would require the CSVs to be contained in the same folder as the images since they all need to be named metadata.csv. The request is that this be made more flexible for datasets with multiple subsets to avoid the need to put a metadata.csv into each image directory where they are not as easily accessed.
Motivation
When creating datasets with multiple subsets I can't get the images to display alongside their associated metadata (it's usually one or the other that will show up). Since this requires a file specifically named metadata.csv, I then have to place that file within the image directory, which makes it much more difficult to access. Additionally, it still doesn't necessarily display the images alongside their metadata correctly (see, for instance, this discussion).
It was suggested I bring this discussion to GitHub on another dataset struggling with a similar issue (discussion). In that case, it's a mix of data subsets, where some just reference the image URLs, while others actually have the images uploaded. The ones with images uploaded are not displaying images, but renaming that file to just metadata.csv would diminish the clarity of the construction of the dataset itself (and I'm not entirely convinced it would solve the issue).
Your contribution
I can make a suggestion for one approach to address the issue:
For instance, even if it could just end in _metadata.csv or -metadata.csv, that would be very helpful to allow for more flexibility of dataset structure without impacting clarity. I would think that the functionality on the backend looking for metadata.csv could reasonably be adapted to look for such an ending on a filename (maybe also check that it has a file_name column?).
Presumably, requiring the configs in a setup like on this dataset could also help in figuring out how it should work?
Yes, that's part of the issue. Also, metadata.csv is a very ambiguous name and we generally try to avoid using the same name for different files within a dataset, as this can quickly lead to confusion.
I think supporting **/*-metadata.csv or **/*_metadata.csv makes sense to me. If it sounds good to you feel free to open a PR to update the patterns here:
Feature request
To display images with their associated metadata in the dataset viewer, a
metadata.csv
file is required. In the case of a dataset with multiple subsets, this would require the CSVs to be contained in the same folder as the images since they all need to be namedmetadata.csv
. The request is that this be made more flexible for datasets with multiple subsets to avoid the need to put ametadata.csv
into each image directory where they are not as easily accessed.Motivation
When creating datasets with multiple subsets I can't get the images to display alongside their associated metadata (it's usually one or the other that will show up). Since this requires a file specifically named
metadata.csv
, I then have to place that file within the image directory, which makes it much more difficult to access. Additionally, it still doesn't necessarily display the images alongside their metadata correctly (see, for instance, this discussion).It was suggested I bring this discussion to GitHub on another dataset struggling with a similar issue (discussion). In that case, it's a mix of data subsets, where some just reference the image URLs, while others actually have the images uploaded. The ones with images uploaded are not displaying images, but renaming that file to just
metadata.csv
would diminish the clarity of the construction of the dataset itself (and I'm not entirely convinced it would solve the issue).Your contribution
I can make a suggestion for one approach to address the issue:
For instance, even if it could just end in
_metadata.csv
or-metadata.csv
, that would be very helpful to allow for more flexibility of dataset structure without impacting clarity. I would think that the functionality on the backend looking formetadata.csv
could reasonably be adapted to look for such an ending on a filename (maybe also check that it has afile_name
column?).Presumably, requiring the
configs
in a setup like on this dataset could also help in figuring out how it should work?I'd also be happy to look at whatever solution is decided upon and contribute to the ideation.
Thanks for your time and consideration! The dataset viewer really is fabulous when it works :)
The text was updated successfully, but these errors were encountered: