Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Normalization of Cartesian Coordinates in node_x, node_y, and node_z #872

Closed
hongyuchen1030 opened this issue Jul 31, 2024 · 15 comments · Fixed by #878
Closed
Labels
bug Something isn't working

Comments

@hongyuchen1030
Copy link
Collaborator

Version

v2024.06

How did you install UXarray?

Conda

What happened?

Our project performs geometric operations on unstructured grids on a unit sphere, which means all points on the sphere should be normalized. The normalized Cartesian coordinates are stored in the node_x, node_y, and node_z variables.

However, while in most cases the node_x, node_y, and node_z variables are normalized as they should be, when reading in the MPAS data, these variables are not normalized. This inconsistency leads to breakdowns in subsequent operations.

What did you expect to happen?

the node_x, node_y, and node_z variables to be consistently normalized across all datasets, including the MPAS data.

It is crucial to emphasize and document that node_x, node_y, and node_zmust always represent normalized points on the sphere. Ensuring these variables are normalized as required is essential for the robustness and reliability of our project.

Can you provide a MCVE to repoduce the bug?

No response

@hongyuchen1030 hongyuchen1030 added the bug Something isn't working label Jul 31, 2024
@philipc2
Copy link
Member

philipc2 commented Aug 1, 2024

the node_x, node_y, and node_z variables to be consistently normalized across all datasets, including the MPAS data.

I'd prefer if we had an option to select the normalization, either as a parameter to ux.open_grid or as a method part of the Grid class, with warnings/exceptions raised when running a feature that requires normalization. While it's not computationally expensive, it might be misleading to someone simply trying to use UXarray to load in the data and use the values that are expected from the grid file.

As of now, this would only impact any MPAS, EXODUS, or UGRID files that contain Cartesian coordinates.

image
# Option 1: parameter 
uxgrid = ux.open_grid(..., normalize_cartesian_coordinates=True)

# Option2 : Grid method
uxgrid.normalize_cartesian_coordinates()

We can support both options as well

Wondering what @erogluorhan and @rajeeja think about this design.

@rajeeja
Copy link
Contributor

rajeeja commented Aug 1, 2024

Sounds good, we can keep True as the default.
On another note, do we want to support a debug option in our fuctions? that highlights more stuff when doing things under the hood? upon check I find xarray itself doesn't have a built-in debug mode

@philipc2
Copy link
Member

philipc2 commented Aug 1, 2024

Sounds good, we can keep True as the default. On another note, do we want to support a debug option in our fuctions? that highlights more stuff when doing things under the hood? upon check I find xarray itself doesn't have a built-in debug mode

Can you elaborate on what you are envisioning the debug mode looking like?

@rajeeja
Copy link
Contributor

rajeeja commented Aug 1, 2024

Can you elaborate on what you are envisioning the debug mode looking like?

simple logging, with say import logging and setting levels: DEBUG, INFO, WARNING, ERROR, and CRITICAL. instead of the current print, we can tackle this later in a different issue.

prints are simple and good, our Numba dependency might prevent us from having logging calls directly because Numba transforms Python code into machine code. logging is expensive, but shouldn't be used in production envs, but, can be very useful to trace bugs.

@hongyuchen1030
Copy link
Collaborator Author

the node_x, node_y, and node_z variables to be consistently normalized across all datasets, including the MPAS data.

I'd prefer if we had an option to select the normalization, either as a parameter to ux.open_grid or as a method part of the Grid class, with warnings/exceptions raised when running a feature that requires normalization. While it's not computationally expensive, it might be misleading to someone simply trying to use UXarray to load in the data and use the values that are expected from the grid file.

As of now, this would only impact any MPAS, EXODUS, or UGRID files that contain Cartesian coordinates.

image ```python # Option 1: parameter uxgrid = ux.open_grid(..., normalize_cartesian_coordinates=True)

Option2 : Grid method

uxgrid.normalize_cartesian_coordinates()


We can support both options as well

Wondering what @erogluorhan and @rajeeja think about this design.

This solution sounds good to me. However, it's important to note that for certain geometry operations, such as _populate_bounds and zonal_means, the node_x, node_y, and node_z values are normalized. This means that even if users are simply loading data with UXarray to use the values as expected from the grid file, calling _populate_bounds or zonal_means could result in the original node_x, node_y, and node_z values being overwritten by the normalized ones.

A simple solution could be to add a normalized_flag attribute to node_x, node_y, and node_z to indicate whether the values are normalized.

However, users must also be aware that once these values are normalized, the original data will be lost.

@erogluorhan
Copy link
Member

I like the parameter or method options (or both) @philipc2 suggested, but when they are in place, there would be a confusion with the case @hongyuchen1030 pointed, i.e. when user wants them not to be normalized but after some point calls one of the functions @hongyuchen1030 mentioned, coords still being forced normalized does not seem ideal to me. @hongyuchen1030 isn't there a way to use their normalized values but not change the actual data?

@hongyuchen1030
Copy link
Collaborator Author

I like the parameter or method options (or both) @philipc2 suggested, but when they are in place, there would be a confusion with the case @hongyuchen1030 pointed, i.e. when user wants them not to be normalized but after some point calls one of the functions @hongyuchen1030 mentioned, coords still being forced normalized does not seem ideal to me. @hongyuchen1030 isn't there a way to use their normalized values but not change the actual data?

That's the things I pointed out initially in this comment to create an extra variable
#785 (comment)

I would like to re-emphasize the following points:

  1. Most data files use normalized coordinates, while some do not. It's crucial to distinguish between these two scenarios.
  2. Calling the normalization without reusing them for future potential algorithm usage will greatly slow down the performance
  3. Although one can convert normalized data back to "real Earth coordinates" by multiplying with the Earth's radius, it is not guaranteed to match the original data if the original data doesn't represent a perfect sphere with the Earth's radius.

@philipc2
Copy link
Member

philipc2 commented Aug 5, 2024

Most data files use normalized coordinates, while some do not. It's crucial to distinguish between these two scenarios.

Wondering if you have any specific examples of this, especially any documentation from the grid definitions or model output that could corroborate this. From the grids that I've worked with (and the UGRID / CF conventions more generally), I haven't encountered any indicators that would tells us that the coordinates we are working with is normalized or not.

This is something we could add a validation function for that would check whether coordinates are normalized if there are no other indicators from the files that we read.

Calling the normalization without reusing them for future potential algorithm usage will greatly slow down the performance

Agreed. Once the coordinates are normalized, they should be still linked to the Grid object and accessed through regular means (i.e. Grid.node_x)

Although one can convert normalized data back to "real Earth coordinates" by multiplying with the Earth's radius, it is not guaranteed to match the original data if the original data doesn't represent a perfect sphere with the Earth's radius.

I don't believe this is an issue with most of the grids that we work with, since they are taken from climate models and not observations. But this is a fair point if we are working with other types of data.

@philipc2
Copy link
Member

philipc2 commented Aug 5, 2024

@erogluorhan

i.e. when user wants them not to be normalized but after some point calls one of the functions @hongyuchen1030 mentioned

Since @hongyuchen1030's functionality expects the coordinates to ne normalized, an exception should be raised, mentioning that the user needs to normalize the coordinates before calling the function.

coords still being forced normalized does not seem ideal to me.

Agreed. We should avoid any unnecessary processing of the grid when loading it in, especially since normalization is only required for a subset of our functionality.

@philipc2
Copy link
Member

philipc2 commented Aug 5, 2024

I think the best route forward would be some combination of the following design features.

Normalization Checks

Have a validation function that can quickly determine whether the stored coordinates are normalized. @rajeeja has already put together a few of these for other purposes.

In-Place Normalization Methods

Have a Grid method that can normalize the Cartesian coordinates (only applies if there are existing coordinates that were parsed from a grid file)

This would look like what I mentioned above:

uxgrid.normalize_cartesian_coordinates()

After calling this method, all of the node/edge/face_x/y/z coordinates will be normalized. This would be an in place operation, meaning that the existing data is over-written. However, this would only become an issue if the user then attempts to save the Grid to a file and the coordinates will not match the original parsed ones.

I personally don't think this is an issue, since the user is expected to normalize the coordinates themselves and we don't automatically do any of it. What do you all think?

We could even add an attribute to each of the data arrays (i.e. normalized=True) that could keep track of whether we normalized or not.

@hongyuchen1030
Copy link
Collaborator Author

hongyuchen1030 commented Aug 5, 2024

Since @hongyuchen1030's functionality expects the coordinates to ne normalized, an exception should be raised, mentioning that the user needs to normalize the coordinates before calling the function.
Agreed. We should avoid any unnecessary processing of the grid when loading it in, especially since normalization is only required for a subset of our functionality.

As far as I am aware, it is not just my functionalities that require data to be on a perfect sphere. Functions such as Welz' algorithm, area calculation, and edge distance computation—all of which are related to geometry—are based on the premise that we are operating on a perfectly spherical surface.

I personally don't think this is an issue, since the user is expected to normalize the coordinates themselves and we don't automatically do any of it. What do you all think?

For a perfectly spherical surface, it is straightforward to convert between normalized and unnormalized data by simply multiplying by the actual radius, which, in our case, is the Earth's radius. A simple way to test this is by multiplying our normalized values by the Earth's radius and checking if they match the input. If the user's data didn't match, that means their original data are not on a perfectly spherical surface, and we don't suggest they to reply on all our analysis results.

Last but not least, as Paul mentioned, the UXarray project description and the associated paper define that we are operating on the unit sphere.

@hongyuchen1030
Copy link
Collaborator Author

I think the best route forward would be some combination of the following design features.

Normalization Checks

Have a validation function that can quickly determine whether the stored coordinates are normalized. @rajeeja has already put together a few of these for other purposes.

In-Place Normalization Methods

Have a Grid method that can normalize the Cartesian coordinates (only applies if there are existing coordinates that were parsed from a grid file)

This would look like what I mentioned above:

uxgrid.normalize_cartesian_coordinates()

After calling this method, all of the node/edge/face_x/y/z coordinates will be normalized. This would be an in place operation, meaning that the existing data is over-written. However, this would only become an issue if the user then attempts to save the Grid to a file and the coordinates will not match the original parsed ones.

I personally don't think this is an issue, since the user is expected to normalize the coordinates themselves and we don't automatically do any of it. What do you all think?

We could even add an attribute to each of the data arrays (i.e. normalized=True) that could keep track of whether we normalized or not.

This sounds good to me. Just a reminder: what if the user calls uxgrid = grid([0.5, 0.5, 0.5]), which invokes the __from_vertex__() function? Currently, __from_vertex__() includes normalization.

We can set it up like __from_vertex__(normalized=True) as the default.

@philipc2
Copy link
Member

philipc2 commented Aug 6, 2024

This sounds good to me. Just a reminder: what if the user calls uxgrid = grid([0.5, 0.5, 0.5]), which invokes the from_vertex() function? Currently, from_vertex() includes normalization.

We can set it up like from_vertex(normalized=True) as the default.

The from_vertex() method has really only been used for testing, so I'm okay with any implementation for it as it's not something that is used by our user-base.

@hongyuchen1030
Copy link
Collaborator Author

After our discussions, it seems crucial to emphasize that the fundamental premise of our project is based on a spherical surface. Users should not expect our calculations to apply to any other surfaces, such as the actual shape of the Earth. By clearly highlighting this premise, users can make informed choices: they can either use the normalized node coordinates or simply multiply them by the Earth's radius.

@philipc2
Copy link
Member

philipc2 commented Aug 6, 2024

After our discussions, it seems crucial to emphasize that the fundamental premise of our project is based on a spherical surface.

I agree, and I believe the majority of our user base already uses data that is on a spherical surface (i.e. climate model output)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

4 participants