Define the origin w.r.t. the pixel corner or center #89

constantinpape · 2022-02-01T10:35:04Z

The image specification (multiscales) must define the origin, either w.r.t. pixel center or corner.

@lassoan summarized this well in #85 (comment):

In 2D applications using one of the pixel corners as origin is probably more common, but in 3D imaging voxel coordinate system origin is typically the center. In VTK, ITK, and all applications based on these toolkits origin is in the pixel center. In all 3D image file formats that I know of (nrrd, nifti, metaio, dicom) origin is in the pixel center, too.
NGFF standard must specify this (if it hasn't been specified it already).
If any software that uses different coordinate system convention internally then it can convert to/from the standard coordinate system when reading/writing images.

So I think we should use the pixel center as origin here as well. Anyone wants to make a small PR to add this, @jbms @d-v-b @lassoan @thewtex? Otherwise I can give this a shot once #85 is merged.

The text was updated successfully, but these errors were encountered:

bogovicj · 2022-02-01T13:12:58Z

So I think we should use the pixel center as origin here as well.

Yes, agreed.

lassoan · 2022-02-01T17:59:40Z

For reference, here is how ITK defines image geometry (origin, spacing, directions): https://itk.org/ITKSoftwareGuide/html/Book1/ITKSoftwareGuide-Book1ch4.html#x45-540004.1.4

constantinpape · 2022-02-02T08:26:19Z

Thanks for providing the link @lassoan! I read this now and in light of yesterdays discussion I think that we should not add it to 0.4, but instead tackle this in 0.5, given that @bogovicj will also make the definition of spaces and data-spaces much more explicit than it currently is. With this the origin definition will make much more sense.

If anyone thinks we need a sentence on it right now, please go ahead and propose changes in a PR; I am not opposed to adding it for v0.4 (or to make a v0.4.1 with just that change), but I can't find a good place to put this in right now without the more explicit space definition.

xulman · 2022-02-02T09:40:49Z

So I think we should use the pixel center as origin here as well.

Yes, agreed.

Besides, there's one center while 4, 8, ? corners... and one would have to specify which corner

jbms · 2022-02-02T19:15:10Z

The corner would surely be the start (i.e. top left) along each dimension, so I don't think there is an issue there.

Though I suppose you could in theory allow a choice of center vs "corner" (i.e. boundary) independently for each dimension.

I have a vague intuition that if the data will be displayed using interpolation then center is a more natural choice, while if the data will be displayed without interpolation (i.e. "pixelated") then corner may be a more natural choice.

For example:

Suppose we have a 1-d array of size 10 at a resolution of 4nm.

If we say integer coordinates are at the corner, then our data corresponds to the continuous physical range [0nm, 40nm] assuming we don't apply any translation.

If we instead had an array of size 5 at a resolution of 8nm, then our data would still correspond to the continuous physical range [0nm, 40nm].

If we say integer coordinates are at the center, then our data corresponds to the continuous physical range [-2nm, 38nm] if we don't apply any translation. If we instead had an array of size 5 at a resolution of 8nm, then we would instead get by default a continuous physical range [-4nm, 36nm].

On the other hand, if we are always interpolating, and will exclude the outer half of the begin and end pixels, then our array of 10 4nm pixels would correspond to the physical range [0, 36nm] and our array of 5 8nm pixels would correspond to the physical range [0, 32nm] which is maybe a bit more intuitive.

When I made the choice in Neuroglancer to use corner rather than center it was more a natural result of how the transforms were implemented rather than a conscious decision, and then changing it later was not an option for backwards compatibility reasons, so I'm curious what others think about this.

thewtex · 2022-02-03T13:50:43Z

Top left is does correspond to how most visualization tools orient their rendering. However, bottom left is how most processing tools orient themselves. By using the center, we avoid these issues.

We want to support processing and rendering in these non-"pixelated" methods, e.g. @tpietzsch demo'ed rendering a splat-type rendering with BigDataViewer. Pixelated, interpolated-pixelated, non-pixelated, processing can all be supported when the transform applies to the pixel center. Considering all use cases, complexities and dependencies on the size of a pixel and which corner is taken to be the start are removed.

jbms · 2022-02-03T14:21:40Z

@thewtex I'm a bit confused when you say "bottom left". Certainly it is fairly common for the origin of a coordinate space to correspond to the "bottom left" corner of the entire image/screen, i.e. the x axis goes left to right, and the y axis goes bottom to top. That is how OpenGL window coordinates are defined, for example.

However, I think here we are talking about the origin within an individual pixel/voxel, not of the entire coordinate space.

Let say we have a zarr array of shape (4, 3). The point label O is contained within pixel (0, 0), while the point labeled Z is contained within pixel (3, 2).

A      B     C     D      E
 +-----+-----+-----+-----+
 |     |     |     |     |
 |  O  |  P  |  Q  |  R  |
 |     |     |     |     |
N+-----+-----+-----+-----+F
 |     |     |     |     |
 |  S  |  T  |  U  |  V  |
 |     |     |     |     |
M+-----+-----+-----+-----+G
 |     |     |     |     |
 |  W  |  X  |  Y  |  Z  |
 |     |     |     |     |
 +-----+-----+-----+-----+
L      K     J     I      H

In terms of the continuous coordinate space, I would say reasonable choices of the (0, 0) origin are A and O.

I think the choice of pixel origin is independent of the choice of which screen direction should by default correspond to each dimension of the coordinate space; that could perhaps be indicated by separate metadata. For example, if the diagram were flipped vertically, but we are still assuming that point O is contained in pixel (0, 0) of the zarr array:

L      K     J     I      H
 +-----+-----+-----+-----+
 |     |     |     |     |
 |  W  |  X  |  Y  |  Z  |
 |     |     |     |     |
M+-----+-----+-----+-----+G
 |     |     |     |     |
 |  S  |  T  |  U  |  V  |
 |     |     |     |     |
N+-----+-----+-----+-----+F
 |     |     |     |     |
 |  O  |  P  |  Q  |  R  |
 |     |     |     |     |
 +-----+-----+-----+-----+
A      B     C     D      E

Then I would still say the reasonable choices of the (0, 0) origin in the continuous coordinate space are A and O.

lassoan · 2022-02-03T14:38:51Z

In radiology (and in 3D medical imaging software libraries and applications I know), the debates around pixel corner/center took place in the early 2000s and the community standardized on pixel center. The decision was not contested later and in general everyone is happy with it.

Image is a continuous signal that can be reconstructed flawlessly from the discrete samples stored in the voxel values (as long as the Nyquist sampling criterion was respected). Therefore pixelated display is unnecessary, and it can be also simply considered incorrect, because we know that the original signal can be reconstructed using a low-pass filter yet we construct a signal using a zero-order hold. Some people switch to pixelated display because they want to see the voxel boundaries, but this goal can be achieved much better by overlaying a grid (the grid always shows voxel boundaries clearly, regardless of brightness/contrast settings and intensity difference between neighbor voxels).

then changing it later was not an option for backwards compatibility reasons, so I'm curious what others think about this.

I can tell about a similar issue that we have in 3D Slicer. At the time when the application was designed, image axis orientation in radiology software was still not standardized. Slicer chose RAS, while over the years the rest of the radiology world ended up using LPS. Slicer kept using RAS for backward compatibility reasons. As time went on, we encountered more and more issues due to this inconsistency, but as more and more features were added, more data was generated, the larger the community grew, switching to the standard convention just got so much more complex and bigger task that it has remained out of our reach. With several years of careful work we managed to switch to standard LPS in all the files, but internally we still use RAS. This is a source of lots of complications, potential errors, and user confusion. It would have been much better to switch many years ago, when everything was still smaller and simpler.

This is of course just one example, but it illustrates how you can get into a tough spot if your software diverges from common conventions in your field. The change to using pixel center when representing 3D images with a 3D array may look hard and/or unnecessary, but in the long term it will be just harder and it may turn out that the change is practically unavoidable (because you just need to spend too much time debugging and fixing errors related to half-voxel offset errors and explaining to users and developers why they need to add half voxel offsets here and there when they use or write plugins for your software).

thewtex · 2022-02-04T21:19:13Z

@jbms yes, that's right -- we want to specify O vs A. While we avoid any confusion with the orientation of the coordinate, we also support processing as @lassoan pointed out.

bogovicj · 2022-05-12T21:11:01Z

Here's a blurb I wrote re this definition that I hope will be in the next version:

Coordinate convention

The pixel/voxel center is the origin of the continuous coordinate system.

It is vital to consistently define relationship between the discrete / arrray and continuous / interpolated
coordinate systems. A pixel / voxel is the continuous region that corresponds to a single sample
in the discrete array, i.e., the area corresponding to nearest-neighbor (NN) interpolation of that sample.
The center of a 2d pixel corresponding to the origin (0,0) in the discrete array is the origin of the continuous space
(0.0, 0.0) (when the transformation is the identity). The continuous rectangle of the pixel is given by the
half-open interval [-0.5, 0.5) x [-0.5, 0.5). See chapter 4 and figure 4.1 of the ITK Software Guide [[itk]].

dstansby · 2024-08-16T11:33:30Z

What's the status of this discussion? Is it waiting on someone to open a pull request with a concrete proposal?

d-v-b · 2024-08-16T13:47:56Z

Regarding the status of the discussion, going by emoji and vibes my read is that most participants are in the "pixel center" camp, but that doesn't mean everyone agrees.

I think the "pixel center or pixel corner" question is basically about how to draw little squares on a screen; this is an important concern, but I think it's out of scope for a file format specification for data sampled in physical coordinates. Zarr arrays don't contain pixels, they just contain values. We use metadata to assign those values to positions in space, i.e. we assign each value to a point, (not a square / cube / hypercube). The little squares only enter the story when someone decides to display those values by resampling onto a display coordinate space after using nearest neighbor interpolation, which is just one of several possible interpolation schemes.

As for the future of the spec, I believe John will cover these points better than I can in his forthcoming transformations RFC.

lassoan · 2024-08-17T08:09:06Z

I think the "pixel center or pixel corner" question is basically about how to draw little squares on a screen; this is an important concern, but I think it's out of scope for a file format specification for data sampled in physical coordinates.

I agree that the file format does not need to talk about pixels, as the data array just stores signal samples at physical locations. It is up to the visualization software to specify display coordinate system and render pixels.

However, based on the discussion in this thread, it may worth adding a note to the file format specification that explains the difference between physical and display coordinate systems.

jbms · 2024-08-17T18:47:59Z

The pixel origin matters for mapping between the discrete coordinate space of a zarr array and any continuous coordinate space. In particular, when defining a multiscale, the relative transformations that need to be specified depend on the pixel origin.

jbms · 2024-08-17T18:50:36Z

Also, pixel origin is the center is anyway the defacto standard since that is what existing ome-zarr software and multiscale datasets use. That should be described in the spec.

dstansby · 2024-08-17T20:46:57Z

To get my head around this I did some ASCII art with a toy example.

If the multiscales axes unit is meters, and the coordinate transform scale for this level is 0.2, then an array with four items maps to a physical coordinate system like this:

0       1       2       3       4
|       |       |       |       |
---------------------------------
|       |       |       |       |
0.0m    0.2m    0.4m    0.6m    0.8m

The transform is

[
    {
        "type": "scale",
        "scale": [0.2]
    }
]

This is all the spec is for, the transform between array indices and a continous space. No pixels here!

I think there are two common issues with how an array like this is visualised, and how lower resolution multiscales data are calculated from it. These two issues cancel each other out so they don't look like issues, but in reality they are.

Visualisation

For the above array, it is often visualised (I believe this is how neuroglancer currently works?) as:

0       1       2       3       4
|0000000|1111111|2222222|3333333|...
---------------------------------
|       |       |       |       |
0.0m    0.2m    0.4m    0.6m    0.8m

But this is doesn't seem like a very sensible way to visualise - physical coordiante (e.g.) 0.15m is closer to array value 1, but is being rendered as array value 0. It is worth stressing at this point that the values in the array are not pixel values to be inerpreted as being defined over an imaginary pixel or higher-dimensional voxel, they are values defined at a single point in a physical space.

As such, it makes more sense to use nearest neighbour interpolation:

0       1       2       3       4
0000|1111111|2222222|3333333|4444...
---------------------------------
|       |       |       |       |
0.0m    0.2m    0.4m    0.6m    0.8m

Creating downsampled data

To create a lower resolution level multiscale pyramid, we can bin this data by a factor of two. That means take every two values, and average them. Where should these new values lie in the physical coordinate space? It seems sensible to put them halfway between the two values we averaged:

    0               1
|   |   |   |   |   |   |   |   |
---------------------------------
|       |       |       |       |
0.0m    0.2m    0.4m    0.6m    0.8m

The correct transform for this level is then:

[
    {
        "type": "scale",
        "scale": [0.4]
    }
    {
        "type": "translation",
        "translation": [0.1]
    },
]

But the translation is easy to miss (certainly, software I have written currently misses the translation and I only noticed when writing this comment 😱). Without the translation we would have:

0               1   
|   |   |   |   |   |   |   |   |
---------------------------------
|       |       |       |       |
0.0m    0.2m    0.4m    0.6m    0.8m

If you display this array using the first option in Visualisation above, the pixels overlap with the pixels they have been binned from, so two wrongs have consipred to make something that looks right!

I am guessing this is the thinking the radiology community went through as mentioned in #89 (comment). I used to work in astronomy, where the same issue was also present in the past until we did a drive to educate folks on how array values mapped to pixels (ie, your array value is defined at the centre of a pixel, not the corner).

Sorry for the long post, but I hope it explains stuff for others as well as me? The conclusions I'm getting from this and the conversation above are:

The spec does not need to mention anything about pixels
... but as a community we need to educate folks to get their transforms right when creating multiscale data, and check and potentially correct our own software

d-v-b · 2024-08-17T20:57:15Z

{
   "type": "translation",
  "translation": [-0.5]
}

I think the sign is wrong here -- if your original coordinates were [0, 1, 2, 3, ...], then downsampling by 2x windowed averaging will yield coordinates [0.5, 2.5, ...].

dstansby · 2024-08-17T21:46:15Z

Thanks for catching, I think fixed now!

constantinpape mentioned this issue Feb 1, 2022

Finalizing v0.4 #86

Open

13 tasks

constantinpape mentioned this issue Feb 2, 2022

Proposing spaces and transforms #94

Open

sbesson mentioned this issue Sep 30, 2022

Coordinate systems and new coordinate transformations proposal #138

Draft

bogovicj mentioned this issue Feb 17, 2023

Downsampling and translation google/neuroglancer#440

Closed

psobolewskiPhD mentioned this issue Jun 24, 2024

Fix vispy pixel offset in 3D multi-scale image layers napari/napari#6319

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define the origin w.r.t. the pixel corner or center #89

Define the origin w.r.t. the pixel corner or center #89

constantinpape commented Feb 1, 2022

bogovicj commented Feb 1, 2022

lassoan commented Feb 1, 2022

constantinpape commented Feb 2, 2022

xulman commented Feb 2, 2022 •

edited

Loading

jbms commented Feb 2, 2022

thewtex commented Feb 3, 2022

jbms commented Feb 3, 2022

lassoan commented Feb 3, 2022

thewtex commented Feb 4, 2022

bogovicj commented May 12, 2022

Coordinate convention

dstansby commented Aug 16, 2024 •

edited

Loading

d-v-b commented Aug 16, 2024

lassoan commented Aug 17, 2024

jbms commented Aug 17, 2024

jbms commented Aug 17, 2024

dstansby commented Aug 17, 2024 •

edited

Loading

d-v-b commented Aug 17, 2024

dstansby commented Aug 17, 2024

Define the origin w.r.t. the pixel corner or center #89

Define the origin w.r.t. the pixel corner or center #89

Comments

constantinpape commented Feb 1, 2022

bogovicj commented Feb 1, 2022

lassoan commented Feb 1, 2022

constantinpape commented Feb 2, 2022

xulman commented Feb 2, 2022 • edited Loading

jbms commented Feb 2, 2022

thewtex commented Feb 3, 2022

jbms commented Feb 3, 2022

lassoan commented Feb 3, 2022

thewtex commented Feb 4, 2022

bogovicj commented May 12, 2022

Coordinate convention

dstansby commented Aug 16, 2024 • edited Loading

d-v-b commented Aug 16, 2024

lassoan commented Aug 17, 2024

jbms commented Aug 17, 2024

jbms commented Aug 17, 2024

dstansby commented Aug 17, 2024 • edited Loading

Visualisation

Creating downsampled data

d-v-b commented Aug 17, 2024

dstansby commented Aug 17, 2024

xulman commented Feb 2, 2022 •

edited

Loading

dstansby commented Aug 16, 2024 •

edited

Loading

dstansby commented Aug 17, 2024 •

edited

Loading