-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to optionally return distances with .sel #41
base: master
Are you sure you want to change the base?
Conversation
Changes include: * accessor optionally takes in a "distances_name" str with which to name the new variable in results to store calculated distances from query() * query() now returns distances * distances are returned (always?) in radians from the trees, but are converted to kilometers in query. * If a DataArray was input to .sel with "distances_name" input, a Dataset will be returned to contain the new variable * added example to intro notebook * added a test to test_accessor * also fixed an import in conftest Notes: * I did not test this with the dask approach
needed some updates in precommit too
Sorry for all the additions. I have continued to use the code in the project I'm working on and have realized more about the existing code and better approaches as I go. |
I forgot to say this was meant to address #26. |
@benbovy or @willirath are either of you available to take a look at this? I'm hoping to be able to use these changes in a version update of xoak with a package of mine that uses xoak. Thanks for your time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kthyng! That's a very useful addition.
I've left two minor comments below.
More generally, I'm slightly concerned about the API. Alternatively, we could keep Dataset.xoak.sel()
API as close as possible to Xarray's Dataset.sel
(I plan to eventually refactor xoak in order to reuse Xarray's new, flexible indexes capabilities so that we could use directly Dataset.sel or DataArray.sel) and return both indices and distances via another method (e.g., Dataset.xoak.query
, as suggested in #32)?
@benbovy I'm completely open to the API and detailed choices in the code if you want to change anything. I just need access to the distances! I am aware of the new indices in xarray but haven't seen them in action to see how they work or how to use them yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kthyng, first of all, thanks for looking into this. I'm really happy this package is seeing adoption! (Even more so as I have a few use cases for the distances myself.)
I have left a few small comments the major of which agrees with @benbovy on the question of units. I think I'd follow his suggestion to just expose whatever unit the distances provided by the indexer come in.
Also, feel free to ignore whichever of my comments overlap with @benbovy's remarks and follow his advice instead.
@@ -11,7 +11,7 @@ | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"execution_count": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we wanted to leave these notebooks un-executed in the repo? (@benbovy I vaguely remember that this was the case?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should choose on this I suppose! Alternatively I've been switching my docs over from already-run notebooks to using myst-nb
to be compiled by readthedocs, which I like a lot better. I can help switch this over to that if you want to give it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for using myst-nb
, if you want to give it a try that would be awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbovy I started to switch over to myst-nb
to execute the docs notebooks, but I see that you are already doing that using nbsphinx
! I actually didn't realize you could do that with nbsphinx
. I can still change it over to myst-nb
if you want, but I am not sure if there is any benefit to it over nbsphinx
.
Yes this is pretty new and still largely undocumented. Xoak is a good place to try this feature, but we need a smooth transition so I don't expect it to be straightforward.
Sorry I could have provided more details in my previous comment. I was thinking about this alternative:
This would look like this (reusing the examples in the documentation): >>> ds_mesh
# <xarray.Dataset>
# Dimensions: (x: 100, y: 100)
# Coordinates:
# lat (x, y) float64 ...
# lon (x, y) float64 ...
# Dimensions without coordinates: x, y
# Data variables:
# field (x, y) float64 ...
>>> ds_trajectory
# <xarray.Dataset>
# Dimensions: (trajectory: 30)
# Dimensions without coordinates: trajectory
# Data variables:
# latitude (trajectory) ...
# longitude (trajectory) ...
>>> ds_mesh.xoak.set_index(['lat', 'lon'], 'sklearn_geo_balltree')
>>> distances, indices = ds_mesh.xoak.query(
... lat=ds_trajectory.latitude,
... lon=ds_trajectory.longitude,
... )
>>> distances
# <xarray.DataArray (trajectory: 30)>
# ...
# Dimensions without coordinates: trajectory
>>> indices
# <xarray.Dataset>
# Dimensions: (trajectory: 30)
# Dimensions without coordinates: trajectory
# Data variables:
# x (trajectory) int64 ...
# y (trajectory) int64 ... Having a The >>> results = ds_mesh.isel(indices).assign(distances=distances)
>>> results
# <xarray.Dataset>
# Dimensions: (trajectory: 30)
# Coordinates:
# lat (trajectory) float64 ...
# lon (trajectory) float64 ...
# Dimensions without coordinates: trajectory
# Data variables:
# field (trajectory) float64 ...
# distances (trajectory) float64 ... Now, this is clearly more verbose than what is proposed in this PR: >>> ds_mesh.xoak.sel(
... lat=ds_trajectory.latitude,
... lon=ds_trajectory.longitude,
... distances_name="distances"
... ) Also, because |
If we're going to support both, I think we can either merge this PR and then refactor #32 or the other way around. |
@benbovy I like having the indices and distances explicitly available with
as you suggest. One thing I didn't like about my change is that when a DataArray is input but distances are returned, a Dataset is returned to contain both variables. It's been messing up my code here and there. Maybe it would be better to keep it simple with
But I do think it should be up to you @benbovy and @willirath so let me know what you prefer and I can try to implement, or you can go ahead with it, either way. |
I think leaving |
Changes include:
Notes: