-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Definition of cmin
property box
#102
Comments
Another alternative would be to require |
In my view, this would not help much because relying on a specific ordering and passing around the previously decoded boxes would make the implementation rather awkward. It would also somehow negate the intention of having the focal length resolution independent. We can solve anything in software. Also the current definition is no showstopper. We can make it work. But a different definition would make things cleaner and less error-prone. |
Yes, I think that |
Good comments, I definitely think this should be clarified in the text. Gut feelings on
Gut feelings on If we start with how current smartphones work, taking a photo in LandscapeLeft orientation or LandscapeRight orientation makes no difference. The images are captured in sensor orientation but get the appropriate If the smartphone has multiple sensors and could capture stereo video or stills, I would expect the same to hold. Images are captured in sensor orientation, with the appropriate The counterargument would be that you don't want to use this for stereo capture at all and simply want to document your sensor setup. In that case it probably makes most sense to specify it before transformative properties. |
Or: one could say that |
@farindk you're right, I didn't fully think through how a rotation would affect the matrix outside of the trivial case of a centered principal point. In general in HEIF the output of an image item is the input + any transformations. Surfacing image items with orientation not applied so that it can be applied at display time is really a perf optimization and not what the spec tells you should happen. This means that it can never be wrong for a parser to only surface the final output with all transforms applied, which then also strongly indicates that the metadata should describe that output, not some potentially never-surfaced input. Some more internal discussion with relevant teams also agrees with this. I'll try to write up a contribution for the next meeting, but I think a summary of the discussion would then be:
|
The problem with 'should' is that I find it non-trivial how to adapt the parameters. If it is expected by the reading application to modify the parameters, there should at least be a note describing how that should be done. I spent some time trying to figure out how to change the parameters based on an That's quite complex and error-prone. Thus, I would prefer 'shall' to remove the burden from the reader. If there is an easier way to modify the matrices that I overlooked, please let us know. |
This could not be changed in the same HEIF amendment that introduces the properties since that's already too far along for changes. The clarifications have instead been added to 3ed AMD1 (latest text here). Main changes:
|
One other thing to note here just so everyone is on the same page now that this is under discussion and getting implemented: The While world-to-camera tends to be used more with computer vision use-cases, we felt that camera-to-world made more intuitive sense when you're trying to describe how a number of cameras are positioned relative to each other. In other words, |
You can specify this accurately mathematically? A verbal description can be interpreted in many different ways. |
No. I think the text covers the mathematics pretty clearly, I just wanted to call out that The latest public document I can find is this, but it should be very similar/identical to the final text. In particular these two snippets should cover the mathematics: |
Ok, so according to the equations, the "translation before rotation" means that we take the camera at its position in the world coordinate system, move it to the origin by the provided translation vector, and then rotate it at the origin, which makes sense. So, in fact, you are sending the inverse of the extrinsic matrix. |
In w22940, the definition of the camera intrinsic matrix property box ($f_x$ and $f_y$ and normalized by the image size:
cmin
) includes the focal length, separated intoMoreover, transmission of f_y is optional and if it is not transmitted, f_y = f_x should be used (Note 1).
This is problematic as detailed below.
As a background, the focal length specifies the distance of the image plane (sensor) from the focal center.
$f$ denotes the focal length and $a$ the aspect ratio of pixels, which is 1 for square pixels. (Principal point $p_x$ , $p_y$ , and skew $s$ are not relevant here.)$a\cdot y$ is combined to a $f_y$ .
A general form of the intrinsic matrix is
Where
There is no such thing like a horizontal and vertical focal length, but one can sometimes still come across the f_x, f_y naming scheme when
It makes sense to specify the focal length in pixels (unlike the typical 'mm' used in photography) and to normalize it to the image size to keep it independent from the image resolution.
However, the problem is that when the$f_y=f_x$ as
cmin
box is read and(flags & 1)==false
, one should set f_y=f_x. However, this is not possible without knowing the image size.If a reader wants to output the focal length in unnormalized pixel units, it is clear that it needs to know the image size, but even if the reader wants to output a normalized focal length (originally meant to be independent of the image size), it still has to know the image size to compute
focal_length_y
according tofocal_length_y = focal_length_x * image_width / image_height
.This means that a
cmin
box cannot be interpreted independently from aispe
box.There might be two options to change this:
cmin
box instead sends(flags & 1)==false
.Both of these options would make it possible to read the
cmin
box without referencing theispe
box and they also keep the focal length independent of the image resolution.I assume that the case
(flags & 1)==true
where f_y and skew and transmitted, is currently not used much, because they are only relevant when there are non-square or skewed pixels. Thus, a non-breaking last-minute change to the definition might still be possible.The text was updated successfully, but these errors were encountered: