Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute Grid Centerpoint using Welzl's algorithm #811

Open
wants to merge 51 commits into
base: main
Choose a base branch
from

Conversation

rajeeja
Copy link
Contributor

@rajeeja rajeeja commented Jun 11, 2024

No description provided.

@rajeeja rajeeja changed the title DRAFT: Compute Grid Centerpoint using Welzl's algorithm Compute Grid Centerpoint using Welzl's algorithm Jun 26, 2024
@rajeeja rajeeja requested a review from philipc2 June 26, 2024 23:10
uxarray/grid/coordinates.py Outdated Show resolved Hide resolved
uxarray/grid/coordinates.py Show resolved Hide resolved
uxarray/grid/coordinates.py Outdated Show resolved Hide resolved
uxarray/grid/coordinates.py Outdated Show resolved Hide resolved
@philipc2
Copy link
Member

philipc2 commented Jul 2, 2024

I'm not a big fan of the _ctrpt approach to the properties. These values are still, for example, face_lon and face_lat, they simply use a different algorithm to compute them.

Maybe we should consider the following design?

  • Have the default face_xyz or face_latlon values be either what was parsed from a dataset OR the existing Cartesian averaging
  • Introduce a grid-level Grid.populate_face_coordinates() function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.

This would make the workflow look something like the following:

# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg
uxgrid.face_lon

# I want to explicitly set the algorithm to be Welzl
uxgrid.populate_face_coordinates(method='welzl')

# value will now be populated using your approach
uxgrid.face_lon

# I want to re-populate again using cartesiain averaging
uxgrid.populate_face_coordinates(method='cartesian average')

# value will now be populated using artesian average
uxgrid.face_lon

This allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?

@philipc2 philipc2 linked an issue Jul 2, 2024 that may be closed by this pull request
philipc2 and others added 4 commits July 3, 2024 12:08
…dependency (use with arcs and arcs use coordinates). o Remove new routine in favor of using the existing angle b/w vectors to calculate distance.
@rajeeja
Copy link
Contributor Author

rajeeja commented Jul 9, 2024

I'm not a big fan of the _ctrpt approach to the properties. These values are still, for example, face_lon and face_lat, they simply use a different algorithm to compute them.

Maybe we should consider the following design?

  • Have the default face_xyz or face_latlon values be either what was parsed from a dataset OR the existing Cartesian averaging
  • Introduce a grid-level Grid.populate_face_coordinates() function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.

This would make the workflow look something like the following:

# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg
uxgrid.face_lon

# I want to explicitly set the algorithm to be Welzl
uxgrid.populate_face_coordinates(method='welzl')

# value will now be populated using your approach
uxgrid.face_lon

# I want to re-populate again using cartesiain averaging
uxgrid.populate_face_coordinates(method='cartesian average')

# value will now be populated using artesian average
uxgrid.face_lon

This allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?

During my testing and sometimes in testing the face geometry both centerpoint and centroid might be needed. When working with a mesh I wanted to check how much did one deviate from the other and if one or the other made more sense.

We might be able to get both with the way you propose also, but with two calls to populate one with either options, having both available to the grid object at once might be better.

We can get another name for ctrpt, I don't like it also:)

@philipc2
Copy link
Member

philipc2 commented Aug 1, 2024

I'm not a big fan of the _ctrpt approach to the properties. These values are still, for example, face_lon and face_lat, they simply use a different algorithm to compute them.
Maybe we should consider the following design?

  • Have the default face_xyz or face_latlon values be either what was parsed from a dataset OR the existing Cartesian averaging
  • Introduce a grid-level Grid.populate_face_coordinates() function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.

This would make the workflow look something like the following:

# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg
uxgrid.face_lon

# I want to explicitly set the algorithm to be Welzl
uxgrid.populate_face_coordinates(method='welzl')

# value will now be populated using your approach
uxgrid.face_lon

# I want to re-populate again using cartesiain averaging
uxgrid.populate_face_coordinates(method='cartesian average')

# value will now be populated using artesian average
uxgrid.face_lon

This allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?

During my testing and sometimes in testing the face geometry both centerpoint and centroid might be needed. When working with a mesh I wanted to check how much did one deviate from the other and if one or the other made more sense.

We might be able to get both with the way you propose also, but with two calls to populate one with either options, having both available to the grid object at once might be better.

We can get another name for ctrpt, I don't like it also:)

My main concern with breaking up the different types of coordinates in separate attributes is that it'll add extra overhead for us to ensure that the coordinates we read match the ones that we want to store, not to mention needing to redefine / extent the UGRID conventions further past what we've already done. Even with this (and say some other method down the line), this could end up looking like:

  • Grid.face_lon
  • Grid.face_lon_centerpoint
  • Grid.face_lon_some_other_defenition

Consider the case where two UGRID (or any other format) grid files are loaded into UXarray. If we move forward with a split attribute approach, we'd need to ensure that the coordinates we are reading either go into face_lat/lon or face_lat/lon_centerpoints. There's also no easy way to determine what method each dataset used to compute the centroids at the loading step without parsing for any specific attributes in the file (if they exist), since this is not outlined in the UGRID conventions.

I'm still in favor of keeping face_lon and face_lat and general variables for storing some coordinate that represents the center/midpoint/centroid etc. of the face. This does limit us to only storing one type of "center" coordinate at a time, but ensures that we don't restrict us to strictly defining the type of definition for the center.

@paullric @rljacob Is there ever a sceneiro where we would want to have more than one definition of a "center" coordinate attached to a grid at a time?

benchmarks/mpas_ocean.py Outdated Show resolved Hide resolved
uxarray/grid/grid.py Outdated Show resolved Hide resolved
uxarray/grid/grid.py Outdated Show resolved Hide resolved
uxarray/grid/utils.py Outdated Show resolved Hide resolved
uxarray/grid/grid.py Outdated Show resolved Hide resolved
uxarray/grid/grid.py Outdated Show resolved Hide resolved
uxarray/grid/grid.py Outdated Show resolved Hide resolved
Copy link
Member

@erogluorhan erogluorhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me. Please address just a few simple comments below

uxarray/grid/coordinates.py Show resolved Hide resolved
uxarray/grid/coordinates.py Show resolved Hide resolved
uxarray/grid/grid.py Outdated Show resolved Hide resolved
Copy link
Member

@erogluorhan erogluorhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me. The only thing remaining was the reduced code coverage, but apparently codecov can't track test cases written for the njit-decorated functions. After we figure a path forward with that, I am happy to approve this.

uxarray/grid/grid.py Outdated Show resolved Hide resolved
Comment on lines 433 to 595
Cartesiain y coordinate
z: float
Cartesian z coordinate


Returns
-------
lon : float
Longitude in radians
lat: float
Latitude in radians
"""

lon = math.atan2(y, x)
lat = math.asin(z)

# set longitude range to [0, pi]
lon = np.mod(lon, 2 * np.pi)

z_mask = np.abs(z) > 1.0 - ERROR_TOLERANCE

lat = np.where(z_mask, np.sign(z) * np.pi / 2, lat)
lon = np.where(z_mask, 0.0, lon)

return lon, lat


def _normalize_xyz(
x: Union[np.ndarray, float],
y: Union[np.ndarray, float],
z: Union[np.ndarray, float],
) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Normalizes a set of Cartesiain coordinates."""
denom = np.linalg.norm(
np.asarray(np.array([x, y, z]), dtype=np.float64), ord=2, axis=0
)

x_norm = x / denom
y_norm = y / denom
z_norm = z / denom
return x_norm, y_norm, z_norm


@njit(cache=True)
def _lonlat_rad_to_xyz(
lon: Union[np.ndarray, float],
lat: Union[np.ndarray, float],
) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Converts Spherical lon and lat coordinates into Cartesian x, y, z
coordinates."""
x = np.cos(lon) * np.cos(lat)
y = np.sin(lon) * np.cos(lat)
z = np.sin(lat)

return x, y, z


def _xyz_to_lonlat_deg(
x: Union[np.ndarray, float],
y: Union[np.ndarray, float],
z: Union[np.ndarray, float],
normalize: bool = True,
) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Converts Cartesian x, y, z coordinates in Spherical latitude and
longitude coordinates in degrees.

Parameters
----------
x : Union[np.ndarray, float]
Cartesian x coordinates
y: Union[np.ndarray, float]
Cartesiain y coordinates
z: Union[np.ndarray, float]
Cartesian z coordinates
normalize: bool
Flag to select whether to normalize the coordinates

Returns
-------
lon : Union[np.ndarray, float]
Longitude in degrees
lat: Union[np.ndarray, float]
Latitude in degrees
"""
lon_rad, lat_rad = _xyz_to_lonlat_rad(x, y, z, normalize=normalize)

lon = np.rad2deg(lon_rad)
lat = np.rad2deg(lat_rad)

lon = (lon + 180) % 360 - 180
return lon, lat


@njit
def _normalize_xyz_scalar(x: float, y: float, z: float):
denom = np.linalg.norm(np.asarray(np.array([x, y, z]), dtype=np.float64), ord=2)
x_norm = x / denom
y_norm = y / denom
z_norm = z / denom
return x_norm, y_norm, z_norm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit hesitant in moving these. Can you try moving them back into coordinates.py and provide a bit more info on the cyclical dependencies issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all tests should fail now, please check if there is a better fix. I thought those better belonged to grid/utils.py as they can run as a utility and are not tied to other infra.

#     _xyz_to_lonlat_rad,
#     _lonlat_rad_to_xyz,
#     _xyz_to_lonlat_deg,
#     _normalize_xyz,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit hesitant in moving these. Can you try moving them back into coordinates.py and provide a bit more info on the cyclical dependencies issue?

coordinates.py needs "from uxarray.grid.arcs import _angle_of_2_vectors"
arcs.py needs stuff from coordinates.py -

from uxarray.grid.coordinates import (
_xyz_to_lonlat_rad_scalar,
_normalize_xyz_scalar,
)

@rajeeja
Copy link
Contributor Author

rajeeja commented Sep 16, 2024

Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue

@erogluorhan
Copy link
Member

erogluorhan commented Sep 17, 2024

Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue

How about doing this in such a way: In the beginning of each indidvual case that tests a njit-decorated function, disable numba, in the end, enable it back? This helps us notice right away that whenever we see this kind of disable-enable pairs, that is a case for testing a njit-decorated function.

@rajeeja
Copy link
Contributor Author

rajeeja commented Sep 17, 2024

Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue

How about doing this in such a way: In the beginning of each indidvual case that tests a njit-decorated function, disable numba, in the end, enable it back? This helps us notice right away that whenever we see this kind of disable-enable pairs, that is a case for testing a njit-decorated function.

I removed all the numba stuff on my local and the coverage (85% total and 95% for coordinates.py - see pic below) is considerably higher for sha: dba53dc8 which is before I introduced circular dependency.

Screenshot 2024-09-17 at 17 37 01

benchmarks/quad_hexagon.py Show resolved Hide resolved
benchmarks/mpas_ocean.py Outdated Show resolved Hide resolved
benchmarks/mpas_ocean.py Outdated Show resolved Hide resolved
from uxarray.conventions import ugrid

from typing import Union
from uxarray.grid.arcs import _angle_of_2_vectors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be the root cause of the circular import.

To avoid this, can you try using the following in this module when calling this function:

uxarray.grid.arcs._angle_of_2_vectors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be the root cause of the circular import.

To avoid this, can you try using the following in this module when calling this function:

uxarray.grid.arcs._angle_of_2_vectors

Doesn't work. Numba doesn't like uxarray...

Comment on lines -7 to -68
def _replace_fill_values(grid_var, original_fill, new_fill, new_dtype=None):
"""Replaces all instances of the current fill value (``original_fill``) in
(``grid_var``) with (``new_fill``) and converts to the dtype defined by
(``new_dtype``)

Parameters
----------
grid_var : np.ndarray
grid variable to be modified
original_fill : constant
original fill value used in (``grid_var``)
new_fill : constant
new fill value to be used in (``grid_var``)
new_dtype : np.dtype, optional
new data type to convert (``grid_var``) to

Returns
----------
grid_var : xarray.Dataset
Input Dataset with correct fill value and dtype
"""

# locations of fill values
if original_fill is not None and np.isnan(original_fill):
fill_val_idx = np.isnan(grid_var)
else:
fill_val_idx = grid_var == original_fill

# convert to new data type
if new_dtype != grid_var.dtype and new_dtype is not None:
grid_var = grid_var.astype(new_dtype)

# ensure fill value can be represented with current integer data type
if np.issubdtype(new_dtype, np.integer):
int_min = np.iinfo(grid_var.dtype).min
int_max = np.iinfo(grid_var.dtype).max
# ensure new_fill is in range [int_min, int_max]
if new_fill < int_min or new_fill > int_max:
raise ValueError(
f"New fill value: {new_fill} not representable by"
f" integer dtype: {grid_var.dtype}"
)

# ensure non-nan fill value can be represented with current float data type
elif np.issubdtype(new_dtype, np.floating) and not np.isnan(new_fill):
float_min = np.finfo(grid_var.dtype).min
float_max = np.finfo(grid_var.dtype).max
# ensure new_fill is in range [float_min, float_max]
if new_fill < float_min or new_fill > float_max:
raise ValueError(
f"New fill value: {new_fill} not representable by"
f" float dtype: {grid_var.dtype}"
)
else:
raise ValueError(
f"Data type {grid_var.dtype} not supported" f"for grid variables"
)

# replace all zeros with a fill value
grid_var[fill_val_idx] = new_fill

return grid_var
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be moved either. Please refer to the above comment if the circular import issues still persist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked, it is not used, what is used is in connectivity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmark Run ASV benchmark workflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Welzl's algorithm for "face centerpoint"
6 participants