Handle weight files more elegantly #11

JiaweiZhuang · 2018-01-01T17:33:45Z

(continuing #2 and #9)

In the just-released v0.1.1, the default behavior is to write weights to disk, and there is a regridder.clean_weight_file() method to optionally remove it (demo).

This exactly matches how ESMPy works (directly dump weights to disk, not returning a numpy array) but feels a bit odd. A more intuitive workflow is only getting in-memory weights by default, and have a regridder.save() method to optionally save it to disk (like the model.save() method in Keras).

With current ESMPy (up to 7.1.0dev41) I need to do very weird things to implement this new design:

Use ESMPy to write weights to disk
Read it back to memory
Then automatically delete the weight file by default (what a waste if you actually need it)
Finally, have a method to optionally re-write the weight file to disk. (duplicates step1)

This incurs a lot of unnecessary disk I/O. If ESMPy has the plan to return in-memory weights I will definitely use the new design. Otherwise I'd rather keep the current approach.

@bekozi Any plans on that?

The text was updated successfully, but these errors were encountered:

JiaweiZhuang · 2018-01-01T17:59:38Z

Also the weight file needs to have more complete metadata, not just numbers (can be done outside of ESMPy)

jhamman · 2018-03-16T17:08:28Z

@JiaweiZhuang / @bekozi - has there been any update on this issue? As I understand it, we need a way to have ESMPy return in memory weights. Has that issue been raised upstream?

JiaweiZhuang · 2018-03-16T21:30:36Z

I raised this issue a while ago; not sure if there's any progress

also @rokuingh

bekozi · 2018-06-21T21:56:30Z

Talked with @rokuingh today, and we have a plan for moving this forward. I hope it won't be longer than a few weeks.

JiaweiZhuang · 2018-06-22T00:09:40Z

That's great news @bekozi ! Is it planned for some patch version of 7.1.0 or 8.0.0?

bekozi · 2018-06-22T18:19:36Z

Good question. It will be in 8 but rolled out in a beta snapshot much sooner...

bekozi · 2018-08-27T16:48:46Z

We have factors in the Python interface now and are working on testing. There are still some compiler-related things to address.

Before charging forward, I want to get some input on the interface. The factors themselves are allocated in Fortran, so we need to make sure their deallocation is handled smartly. With bigger grids, we want to minimize copies too. Nothing is fixed now, so open to any suggestions.

regrid = Regrid(..., factors=True)  # factors is False by default
factor_list, factor_index_list = regrid.get_factors(deep_copy=True)  # deep_copy is False by default
# Preferred deallocation method for factors but __del__ will also do this.
regrid.destroy()

I am thinking it makes sense to add an option for zero-based factor indexing. Maybe a start_index option to Regrid that defaults to 1? It needs to happen in the Regrid call so we can adjust the indexing in ESMF and not in the Python layer - again to avoid copies.

JiaweiZhuang · 2018-08-27T18:24:03Z

Glad to see progress on this important issue! I have several comments & questions:

What's in factor_list and factor_index_list? I guess factor_list is an 1D numpy array of weights (S in the offline weights file). Is factor_index_list a 2D array containing col and row, or a list of two 1D arrays? I guess it is a list so you need deep_copy, not just copy?

I would prefer just returning 3 numpy arrays and use copy :

row, col, S = regrid.get_factors(copy=True)

Those arrays will then construct a Scipy sparse matrix by scipy.sparse.coo_matrix(S, (row, col), copy=False) without copying.

Zero-based indexing: Currently in xESMF I simply subtract 1 from default grid indices. Will the start_index implementation be more computationally efficient? If not, this correction can be done in Python level, in-place, so not extra copies will be made.

My general feeling is that direct ESMF/ESMPy calls should only deal with the very core functionalities that must be done in Fortran-level. Then we rarely, rarely need to update ESMPy's API. Once a new ESMPy function like get_factors() is implemented, it can remain stable for many versions. Additional small fixes can be quickly done in the Python level, which is much easier to debug and update.

Naming: would regrid.get_weights() be more intuitive than regrid.get_factors()?
For the sake of consistency, the Field.get_area() method should probably need a copy option as well. It can indeed by done by Field.get_area().copy(), but an copy option is pretty common in numpy/scipy functions.

JiaweiZhuang · 2018-08-27T19:22:31Z

With bigger grids, we want to minimize copies too.

A zero-copying workflow would be:

row, col, S = regrid.get_weights(copy=False)
row -= 1  # modify the Fortran array in-place. Any potential issue?
col -= 1
weights = scipy.sparse.coo_matrix(S, (row, col), copy=False)
regridder = xesmf.Regridder()
regridder.weights = weights  # will be done in class __init__ instead
# do not call regrid.destroy()

In this case, the xesmf.Regridder directly uses the internal Fortran arrays in the ESMF.Regrid object. regrid.destroy() releases the Fortran array, which will destroy the xESMF regridder as well. Is there any potential danger with this approach, even if explicit destroy() is prohibited in xESMF user API? Will other ESMF error mess up the Fortran array? If so, the default behavior should be copy=True followed by destroy(), so the numpy array for weights is decoupled from ESMF; while copy=False without destroy() is only an option to minimize copying. The eventual memory use of the two cases are actually the same; the former uses more peak memory during copying.

bekozi · 2018-08-27T23:06:54Z

Thanks for the input! All good stuff.

Is factor_index_list a 2D array containing col and row, or a list of two 1D arrays? I guess it is a list so you need deep_copy, not just copy?

Out of habit, I prefer to be explicit between shallow and deep copies in Python even when working with NumPy arrays. However, we don't have a policy for naming these in ESMPy. I agree copy is cleaner...need to do some thinking on this...

The function will return a np.float64 ndarray with shape (nfactors,) for the factor list and an np.int32 ndarray with shape (2, nfactors) for the factor index list. The first column in the factor index list is the source index with the second column being the destination index. We'll make sure the docs are very clear about this. My head spins sometimes when trying to map Fortran onto Python and vice versa.

With ESMPy, we have tried to stay close to how ESMF implements things under the hood. Hence, the factor and factor index list will take the same form as the regrid store call in ESMF. Mostly this is an attempt to reduce duplication with the docs.

I would prefer just returning 3 numpy arrays and use copy. Those arrays will then construct a Scipy sparse matrix by scipy.sparse.coo_matrix(S, (row, col), copy=False) without copying.

We could add a convenience function or options that would do this. We may want to expose the ESMF sparse matrix utility through Python at some point. In that case, we would want the factor array forms to be compatible with ESMF.

I thought at one point to consider a get_factors_dict which would return a dictionary with keys (row, col, S) mapping to their associated values (numpy arrays). What do you think? This function would also take a copy parameter. We could definitely name this one get_weights_dict as I expect it could be the preferred entry point for xESMF.

Will the start_index implementation be more computationally efficient?

I don't really know the answer to this. My sense is that it is not worth implementing as in-place NumPy stuff is pretty fast, but it was a question worth posing at least.

My general feeling is that direct ESMF/ESMPy calls should only deal with the very core functionalities that must be done in Fortran-level. Then we rarely, rarely need to update ESMPy's API.

This. 😄 Summed up the ESMPy's design mentality. It doesn't feel Pythonic most of the time!

Naming: would regrid.get_weights() be more intuitive than regrid.get_factors()

It would. It's mostly an attempt to establish consistency with the named factor arguments to ESMF. I don't feel strongly about it though. In my mind, we should consider a get_weights method that just passes through to get_factors...

For the sake of consistency, the Field.get_area() method should probably need a copy option as well.

Good point. The copy flags to these functions are meant to be additional indicators to users that they need to be conscious of ESMF memory.

Is there any potential danger with this approach, even if explicit destroy() is prohibited in xESMF user API? Will other ESMF error mess up the Fortran array?

The safest bet is to deep copy everything as you know. Code that does not explicitly call destroy but creates numerous ESMF.Regrid(..., factors=True) objects in the same scope can get some serious memory accumulations. One option could be to introduce a context manager for ESMF.Regrid to ensure destroys are appropriately called. I hope to answer this better with additional testing.

I'm not quite sure what you mean by other ESMF errors. I think once these arrays are created they should be stable independent of other ESMF usage (in ESMF these arrays are not attached to anything). The same goes for the deallocation. I'm probably not understanding your second question.

JiaweiZhuang · 2018-08-28T00:58:51Z

The function will return a np.float64 ndarray with shape (nfactors,) for the factor list and an np.int32 ndarray with shape (2, nfactors) for the factor index list.

We could add a convenience function or options that would do this.

(2, nfactors) looks good to me. No need to change it or add additional wrappers. The ESMPy API can be as lean as possible.

A minor concern is that factor_index_list probably has Fortran-ordering as other ESMPy output, so row = factor_index_list[0,:] and col = factor_index_list[1,:] are not contiguous in memory. This slows down the sparse MM by ~40%. See this demo. It can be fixed by copying and rearranging the memory layout, anyway.

It would. It's mostly an attempt to establish consistency with the named factor arguments to ESMF. I don't feel strongly about it though. In my mind, we should consider a get_weights method that just passes through to get_factors...

It makes perfect sense for ESMPy to stay close to ESMF. I am OK with the current name. Multiple aliases to the same method will probably confuse pure-ESMPy users.

I thought at one point to consider a get_factors_dict which would return a dictionary with keys (row, col, S)

Looks good me if you don't mind adding & maintaining one more function😀

The safest bet is to deep copy everything as you know. Code that does not explicitly call destroy but creates numerous ESMF.Regrid(..., factors=True) objects in the same scope can get some serious memory accumulations.

This is exactly what I am concerning about. I am not super familiar with ESMPy's memory management, but from what you said I guess repeated calls to regrid = ESMF.Regrid(...) without regrid.destroy() will lead to lots of memory leak. Even though the old ESMF.Regrid object is not referenced by any variable identifiers, the memory will not be freed (in contrast to the automatic garbage collection for normal numpy arrays).

If true, with the zero-copying implementation mentioned above, there will be memory leak if a user repeatedly calls regridder=xesmf.Regridder(...). With copying and explicit destroy(), the regridder will only hold normally-created numpy arrays, and when theregridder variable identifier is overwritten (i.e. points to a new regridder object), the old memory should be automatically garbage-collected.

I'm not quite sure what you mean by other ESMF errors.

I guess I more mean "mis-uses" (like the above case) than "errors"...

In summary:

Current API and naming conventions are all fine to me, as I understand that many of them come from ESMF conventions.
I don't particularly need any convenience wrappers (a stable and simple API is more important to me, not matter how it looks). But if you want to add some wrappers for pure-ESMPy users, I am happy to take advantage of them.
xESMF should make copies of the weights and then explicitly destroy the ESMF object, to safely manage memory. Directly referencing the Fortran array might be an (unrecommended) option for extremely large grids.

JiaweiZhuang · 2018-08-29T14:05:28Z

@bekozi My mistake -- field.get_area() is to compute the area and add to field.data, not directly returning the area array. So it behaves differently from get_factors() and there is no way to add copy option...

bekozi · 2018-08-29T18:57:07Z

This slows down the sparse MM by ~40%. See this demo. It can be fixed by copying and rearranging the memory layout, anyway.

This is a downer. I think what we'll need is a follow on issue where we adapt the regrid store interface in ESMF to return vectors for the index lists directly.

Looks good me if you don't mind adding & maintaining one more function.

Don't mind.

Even though the old ESMF.Regrid object is not referenced by any variable identifiers, the memory will not be freed (in contrast to the automatic garbage collection for normal numpy arrays).

It's hard to guarantee how the Python garbage collection will run. As usual, SO provides a good overview. This is why we introduced the destroy method. The memory will be cleaned up eventually, it may just not happen when you really need it. destroy will be called via __del__ / the destructor on the ESMF.Regrid object when gc runs.

In summary

Thanks for this.

My mistake -- field.get_area() is to compute the area and add to field.data, not directly returning the area array.

You are correct, and you are forgiven. 😄 Not like I caught it anyway.

I'm really hoping to ship these features soon.

bekozi · 2018-09-18T18:00:46Z

Quick update. We hit an impasse with compilers that we are working through. This was not unexpected. The implementation is working with the standard GNU compiler stack but is unstable on others (i.e. Discover standard modules). The rub with this feature is that the weight arrays are dynamically allocated in Fortran complicating pointer management. @rokuingh and I are working through how best to move the arrays up to Python using Fortran's iso_c_binding module. We used a temporary solution to allow Python development to move forward.

You can see the ESMPy method for retrieving the weight dictionary on SF if you like. The good news is that deep copying out of this method will result in contiguous memory from what I can tell.

JiaweiZhuang · 2018-09-18T18:12:20Z

Thanks very much for the work! I think all xESMF users use conda (with GNU compiler) and won't be affected by this compiler issue.

bekozi · 2018-11-14T20:18:55Z

@JiaweiZhuang, the in-memory weights branch is ready for testing. A Dockerfile for building ESMF from that branch is available here.

For reference, here are links to the docstrings for the new interfaces. ESMF doesn't build docs for branches...

In our testing, the factor arrays contain zero elements with no spatial overlap (disjoint geometries) and unmapped action is ignore. Again, sorry for the delay on this. Hopefully we won't encounter too many issues during your implementation.

JiaweiZhuang · 2018-11-14T21:12:23Z

@bekozi Aha, finally!!

I can use the Dockerfile without problems. Will let you know if I hit any issues.

JiaweiZhuang · 2018-11-14T21:31:43Z

@bekozi One minor documentation thing: create_rh is not documented in ESMF.Regrid's docstring.

Simply using the one in ESMF docs should be fine:

create_rh
Specifies whether or not to create a routehandle, or just write weights to file. If not specified, defaults to ESMF_TRUE.

I assume that it should be set to False if the Regrid call is only for weight retrieval, either offline (filename = '...') or online (factors=True) .

bekozi · 2018-11-16T16:20:10Z

One minor documentation thing: create_rh is not documented in ESMF.Regrid's docstring.

Noted! Thanks.

I assume that it should be set to False if the Regrid call is only for weight retrieval, either offline (filename = '...') or online (factors=True) .

Yes, it makes sense to use create_rh=False for best performance when returning factors. This should be noted in the documentation. ~~I'll update the test loop too.~~ Edit: It looks like this is already tested appropriately.

JiaweiZhuang · 2019-08-06T05:36:15Z

I already have a mostly-working code for in-memory weight retrieval, but it relies on ESMF 8.0.0 that is not publicly released yet. So far I've been building the development branch of ESMF at https://github.com/JiaweiZhuang/Docker_ESMPy. Will push my new code when ESMPy/ESMF 8.0.0 becomes conda-installable.

This is of top-priority as the current offline weight approach is quite awkward and hard to test.

JiaweiZhuang · 2019-10-16T03:04:25Z

ESMF 8.0.0 is released today and supports online weight retrieval (see release notes):

The following capabilities were added to the ESMF Python interface (ESMPy):
Added an in-memory weight generation option to the Regrid class, allowing re-use of weight vectors without writing them to netCDF. The weight arrays can be returned as NumPy objects or Python dictionary of weight vectors. This allows retrieval of the weights by source and destination key/value pairs.

JiaweiZhuang mentioned this issue Jan 1, 2018

Regridding API design #9

Closed

JiaweiZhuang added the enhancement label Feb 27, 2018

rabernat mentioned this issue Mar 2, 2018

tiledb pangeo-data/pangeo#120

Closed

JiaweiZhuang mentioned this issue Jun 20, 2018

regridding cubed-sphere-type datasets #21

Open

JiaweiZhuang mentioned this issue Jul 18, 2018

Spatial chunking for concurrent dask regridding #26

Closed

JiaweiZhuang mentioned this issue Aug 20, 2018

Contributor guide? #28

Open

1 task

JiaweiZhuang mentioned this issue Sep 10, 2018

Making silent the reuse of weights message... #34

Closed

JiaweiZhuang mentioned this issue Sep 26, 2018

Detect when old and new grid don't overlap? #36

Open

JiaweiZhuang added a commit to JiaweiZhuang/Docker_ESMPy that referenced this issue Nov 14, 2018

Copied from JiaweiZhuang/xESMF#11 (comment)

6f6f367

JiaweiZhuang mentioned this issue Feb 18, 2019

let's enumerate all the ways to represent a "grid" in python pangeo-data/pangeo#356

Closed

JiaweiZhuang mentioned this issue Mar 20, 2019

reuse weights only checked by shape #48

Closed

JiaweiZhuang pinned this issue Apr 25, 2019

JiaweiZhuang mentioned this issue Aug 6, 2019

map file generation is slow and fails for big problems NCAR/esmlab-regrid#5

Open

bradyrx mentioned this issue Oct 16, 2019

Generate weighting file with full variable list #69

Open

This was referenced Nov 12, 2019

ESMPy release 8.0.0 conda-forge/esmpy-feedstock#26

Closed

Proposed new regridder save-load API in 0.3.0 (not backward compatible) #75

Open

JiaweiZhuang unpinned this issue Nov 21, 2019

Mikejmnez mentioned this issue Feb 16, 2022

0.5 <= xesfm <= 0.6.2 yields Attribute Error with Regridder hainegroup/oceanspy#221

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle weight files more elegantly #11

Handle weight files more elegantly #11

JiaweiZhuang commented Jan 1, 2018

JiaweiZhuang commented Jan 1, 2018 •

edited

Loading

jhamman commented Mar 16, 2018

JiaweiZhuang commented Mar 16, 2018

bekozi commented Jun 21, 2018

JiaweiZhuang commented Jun 22, 2018

bekozi commented Jun 22, 2018

bekozi commented Aug 27, 2018

JiaweiZhuang commented Aug 27, 2018 •

edited

Loading

JiaweiZhuang commented Aug 27, 2018 •

edited

Loading

bekozi commented Aug 27, 2018

JiaweiZhuang commented Aug 28, 2018 •

edited

Loading

JiaweiZhuang commented Aug 29, 2018

bekozi commented Aug 29, 2018

bekozi commented Sep 18, 2018

JiaweiZhuang commented Sep 18, 2018 •

edited

Loading

bekozi commented Nov 14, 2018

JiaweiZhuang commented Nov 14, 2018

JiaweiZhuang commented Nov 14, 2018

bekozi commented Nov 16, 2018 •

edited

Loading

JiaweiZhuang commented Aug 6, 2019 •

edited

Loading

JiaweiZhuang commented Oct 16, 2019 •

edited

Loading

Handle weight files more elegantly #11

Handle weight files more elegantly #11

Comments

JiaweiZhuang commented Jan 1, 2018

JiaweiZhuang commented Jan 1, 2018 • edited Loading

jhamman commented Mar 16, 2018

JiaweiZhuang commented Mar 16, 2018

bekozi commented Jun 21, 2018

JiaweiZhuang commented Jun 22, 2018

bekozi commented Jun 22, 2018

bekozi commented Aug 27, 2018

JiaweiZhuang commented Aug 27, 2018 • edited Loading

JiaweiZhuang commented Aug 27, 2018 • edited Loading

bekozi commented Aug 27, 2018

JiaweiZhuang commented Aug 28, 2018 • edited Loading

JiaweiZhuang commented Aug 29, 2018

bekozi commented Aug 29, 2018

bekozi commented Sep 18, 2018

JiaweiZhuang commented Sep 18, 2018 • edited Loading

bekozi commented Nov 14, 2018

JiaweiZhuang commented Nov 14, 2018

JiaweiZhuang commented Nov 14, 2018

bekozi commented Nov 16, 2018 • edited Loading

JiaweiZhuang commented Aug 6, 2019 • edited Loading

JiaweiZhuang commented Oct 16, 2019 • edited Loading

JiaweiZhuang commented Jan 1, 2018 •

edited

Loading

JiaweiZhuang commented Aug 27, 2018 •

edited

Loading

JiaweiZhuang commented Aug 27, 2018 •

edited

Loading

JiaweiZhuang commented Aug 28, 2018 •

edited

Loading

JiaweiZhuang commented Sep 18, 2018 •

edited

Loading

bekozi commented Nov 16, 2018 •

edited

Loading

JiaweiZhuang commented Aug 6, 2019 •

edited

Loading

JiaweiZhuang commented Oct 16, 2019 •

edited

Loading