-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datum stats output format #1367
Comments
In addition to that I have been using he cli to get the mean (where it generetes a file) the programmetic way is not that strightforward, it would be great if you can point me to how I can get the dataset imformation such as mean programmeticaly Thanks |
Hi @ganindu7, Sorry for the late response. The image mean statistic collector reports it as the RGB format. Please see the following code. It might be an answer to your another question about getting dataset information programmatically too. import datumaro as dm
import numpy as np
import cv2
img = np.zeros([10, 10, 3])
# Since cv2 uses BGR format, we save a 10x10 PNG image filled with a [0,1,2] RGB background
img[:,:,0] = 2 # B
img[:,:,1] = 1 # G
img[:,:,2] = 0 # R
cv2.imwrite("test.png", img)
dataset = dm.Dataset.from_iterable(
[
dm.DatasetItem(
id=f"{subset}_{idx}",
subset=subset,
media=dm.Image.from_file("test.png"),
annotations=[dm.Label(label=idx % 2)]
)
for idx in range(10)
for subset in ["train"]
],
categories=["cat", "dog"],
)
print(
dm.components.operations.compute_image_statistics(dataset)
) It will produce {
'dataset': ...,
'subsets': {
'train': {
'images count': 10,
'image mean': [0.0, 0.9999999999999976, 1.9999999999999951],
'image std': [0.0, 8.530894156933093e-08, 1.7061788313866186e-07]
}
}
} You can see that the However, please note that Datumaro internally uses [[[2 1 0]
[2 1 0]
[2 1 0]
[2 1 0]
[2 1 0]
[2 1 0]
[2 1 0]
[2 1 0]
[2 1 0]
[2 1 0]]
...
[2 1 0]
[2 1 0]
[2 1 0]
[2 1 0]]] This can be really confusing for users. It must be improved in the future. We will pile this on our backlog. Thanks for your interests. |
Thanks a lot!
…On Tue, 26 Mar 2024 at 01:37, Vinnam Kim ***@***.***> wrote:
Hi @ganindu7 <https://github.com/ganindu7>, Sorry for the late response.
The image mean statistic collector reports it as the RGB format. Please see
the following code. It might be an answer to your another question about
getting dataset information programmatically too.
import datumaro as dmimport numpy as npimport cv2
img = np.zeros([10, 10, 3])# Since cv2 uses BGR format, we save a 10x10 PNG image filled with a [0,1,2] RGB backgroundimg[:,:,0] = 2 # Bimg[:,:,1] = 1 # Gimg[:,:,2] = 0 # Rcv2.imwrite("test.png", img)
dataset = dm.Dataset.from_iterable(
[
dm.DatasetItem(
id=f"{subset}_{idx}",
subset=subset,
media=dm.Image.from_file("test.png"),
annotations=[dm.Label(label=idx % 2)]
)
for idx in range(10)
for subset in ["train"]
],
categories=["cat", "dog"],
)
print(
dm.components.operations.compute_image_statistics(dataset)
)
It will produce
{ 'dataset': ..., 'subsets': { 'train': { 'images count': 10, 'image mean': [0.0, 0.9999999999999976, 1.9999999999999951], 'image std': [0.0, 8.530894156933093e-08, 1.7061788313866186e-07] } }}
You can see that the 'image mean' field has [0,1,2] (RGB) as what we
created previously.
However, please note that Datumaro internally uses opencv-python, so that
it uses a BGR format as a default if not given a special configuration. For
example, print(dataset.get("train_0", "train").media.data) will returns
[[[2 1 0] [2 1 0] [2 1 0] [2 1 0] [2 1 0] [2 1 0] [2 1 0] [2 1 0] [2 1 0] [2 1 0]]
... [2 1 0] [2 1 0] [2 1 0] [2 1 0]]]
We found that this feature is not friendly for users and it must be
improved in the future. It will be piled on our backlog.
Thanks for your interests.
—
Reply to this email directly, view it on GitHub
<#1367 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABG4JUSH4BF5I7NGDLPMLC3Y2DGNPAVCNFSM6AAAAABFAIUVAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJZGIZDQMBYGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
*Best RegardsGanindu Nanayakkara*
|
### Summary - Ticket no. 137105 and GH issue #1367 - Print the color channel format as well (RGB) - Change the statistic accumulation logic to use ImageColorChannel context ### How to test Revisit the unit test as well to cover this change ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [x] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [x] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [ ] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [ ] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [ ] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2024 Intel Corporation # # SPDX-License-Identifier: MIT ``` --------- Signed-off-by: Kim, Vinnam <[email protected]> Co-authored-by: Wonju Lee <[email protected]>
Hi,
the stats command outputs stats
are the dataset image means/stats in RGB or BGR format?
I tried the docs and there was no hint I could find.
also is there a api call to generate thae stats for images,
I looked at the tests but I could not see a single call to such a function (however I saw stats calculaiutons per images)
Thanks a lot,
Ganindu.
The text was updated successfully, but these errors were encountered: