Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling 'point_cloud' Field in JSON Export Without Images: Import Error in Datumaro #1626

Closed
ainayves opened this issue Oct 3, 2024 · 4 comments
Assignees

Comments

@ainayves
Copy link

ainayves commented Oct 3, 2024

Hello, dear developers of Datumaro,

First, thank you for Datumaro, which I recently started using and find very interesting.

I have a question that has been on my mind for a few days.

Here are the steps I followed:

  1. Exporting a dataset from CVAT in Datumaro format without requiring the corresponding images to be re-downloaded.

  2. Running the following commands:

    datum project create
    datum project import -f datumaro <path to the JSON file>

  3. I then encountered the following error:

datumaro.components.errors.MediaTypeError: Unexpected media type of a dataset '<class 'datumaro.components.media.Image'>'.
Expected media type is '<class 'datumaro.components.media.PointCloud'>.
  1. Upon investigation, I found that the JSON file includes a "point_cloud" field when exporting without images and when the image has no annotations.
  "items": [
    {
      "id": "1713265728.502414164",
      "annotations": [],
      "attr": {
        "frame": 0
      },
      "point_cloud": {
        "path": ""
      }
    }
]
....
  1. I manually removed all "point_cloud" fields to make the import work.

My question is : Is there a way to automatically ignore the "point_cloud" field when using Datumaro? Or should I always manually remove it in cases of export without images? Alternatively, could you suggest a different approach?

Note : Sometimes, datasets annotated in CVAT can include thousands of images, so re-downloading them would be a huge time drain.

Thanks in advance for your help, and thank you again for this tool.

@sooahleex
Copy link
Contributor

Hi @ainayves, sorry for the late reply. I tried to reproduce your problem with our test asset with cvat format.

import datumaro as dm
test_path = "tests/assets/cvat_dataset/for_images/export_project"
dm_dataset = dm.Dataset.import_from(test_path, format="cvat")
dm_dataset.export("cvat2datum", format="datumaro")

And I tried the following commands you mentioned

datum project create
datum project import -f datumaro ~/workspace/datumaro/cvat2datum/annotations/Train.json

For me this command works well with the following results

2024-10-12 14:10:30,168 INFO: Checking source... 
2024-10-12 14:10:30,217 INFO: Source 'source-1' with format 'datumaro' has been added to the project

If you think the method I used was wrong, could you give me the dataset you used? Let me look at it again.

@ainayves
Copy link
Author

ainayves commented Oct 14, 2024

Thank you for your answer @sooahleex ,

In fact , I directly export the annotation in Datumaro format from CVAT

image

Then, I get this json , with "point_cloud" item , and the import command doesn't work :

default.json

@sooahleex
Copy link
Contributor

Hi @ainayves I updated to unread point cloud when images do not exist and point cloud too. This update will be included in the next release. Thank you for reporting this issue.

@ainayves
Copy link
Author

Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants