Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEP 18, physical units, uncertainties, and the scipp library? #3509

Closed
SimonHeybrock opened this issue Nov 11, 2019 · 3 comments
Closed

NEP 18, physical units, uncertainties, and the scipp library? #3509

SimonHeybrock opened this issue Nov 11, 2019 · 3 comments

Comments

@SimonHeybrock
Copy link

This is an idea and meant as a discussion starter on an potential route to providing support for physical units and propagation of uncertainties for xarray.

Context

  1. NEP 18 (which, as far as I understand, was pushed by you guys, for similar purposes) provides means to combine such features with xarray using __array_function__, if the underlying array implementation supports it.

  2. I am working on scipp, which (based on a decision I may or may not regret in the future) is reimplementing a lot of features that xarray provides, plus some additional features. Two of these features are physical units and propagation of uncertainties.

    • scipp.Variable is essentially equivalent to a numpy array with a unit, dimension labels, and an optional array of uncertainties. [*]
    • scipp implements basic arithmetic operations (and some more) for scipp.Variable, including efficient propagation of uncertainties.

[*] Caveat: scipp's current unit implementation is static an would probably need to be replaced for becoming useful for a wider audience.

Idea and questions

Can we implement the __array_function__ protocol for scipp.Variable so it can be used with xarray? As far as I can tell this would simply be a lightweight wrapper.

  • Did I understand __array_function__ correctly?
  • Is there anything else I should be aware of?
  • Would anyone be interested in this?

This would amount to using the lower-level parts of scipp which is quite compact and can be extended to support more data types and more operations in a relatively simple manner (requiring recompilation, since it is written in C++).

@jthielen
Copy link
Contributor

With regards to physical units (and to a lesser extent propagation of uncertainties), this would have overlap with pint. Efforts have been ongoing towards integration with xarray through NEP-18 (xref #525, hgrecco/pint#764, hgrecco/pint#845, hgrecco/pint#849, as well as #3238 and following test implementation PRs), but are still not quite there yet...hopefully very soon though!

Would you be able to describe any advantages/disadvantages you would see with xarray wrapping scipp, versus something like xarray > pint > uncertainties > numpy?

@SimonHeybrock
Copy link
Author

SimonHeybrock commented Nov 12, 2019

@jthielen Thanks for your reply! I am not familiar with pint and uncertainties so I cannot go in much detail there, so this is just generally speaking:

Units

I do not see any advantage using scipp. The current unit system in scipp is based on boost::units, which is very powerful (supporting custom units, heterogeneous systems, ...), but unfortunately it is a compile-time library (EDIT 2022: This does not apply any more since we have long switched to a runtime units library). I would imagine we would need to wrap another library to become more flexible (we could even consider wrapping something like pint's unit implementation).

Uncertainties

There are two routes to take here:

1. Store a single array of value/variance pairs

  • Propagation of uncertainties is "fast by default".
  • Probably harder to vectorize (SIMD) since data layout implies interleaved values. In practice this is unlikely to be relevant, since many workloads are just limited by memory bandwidth and cache sizes, so vectorization is not crucial in my experience.

2. Store two arrays (values array and uncertainties array)

  • This is what scipp does.
  • Special care must be taken when implementing propagation of uncertainties: Naive implementation based on operating with arrays will lead to massive performance loss (I have seen 10x or more) for things like multiplication (there is no penalty for addition and subtraction).
    • In practice this is not hard to do, we simply need to avoid computing the result's values and variances in two steps and put everything into a single loop. This avoids allocation of temporaries and loading / storing from memory multiple times.
    • Scipp does this, and does not sacrifice any performance.
  • Save 2x in performance when operating only with values, even if variances are present.
  • Can add/remove variances independently, e.g., if no longer needed, avoiding copies.
  • Can use existing numpy code to operate directly with values and variances (could probably be done in case 1., with a stride, loosing some efficiency).

Other aspects

Scipp supports a generic transform-type operation that can apply an arbitrary lambda to variables (units + values array + variances array).

  • This is done at compile-time and therefore static. It does however allow for very quick addition of new compound operations that propagate units and uncertainties.
  • For example, we could generate an operation sqrt(a*a + b*b):
    • automatically written using a single loop => fast
    • gives the correct output units
    • propagates uncertainties
    • does all the broadcasting and transposing
  • Not using expression templates, in case anyone asks.

Other

  • scipp.Variable includes the dimension labels and operations can do broadcasting and transposition, yielding good performance.
    I am not sure if this an advantage or a drawback in this case?
    Would need to look more into the inner workings of xarray and the __array_function__ protocol.

  • Scipp is written in C++ with performance in mind. That being said, it is not terribly difficult to achieve good performance in these cases since many workloads are bound by memory bandwidth (and probably dozens of other libraries have done so).

Questions

  • What is pint's approach to uncertainties?
  • Have you looked at the performance? Is performance relevant for you in these cases?

@jthielen
Copy link
Contributor

jthielen commented Jan 10, 2020

@SimonHeybrock So sorry I neglected to reply back in November! hgrecco/pint#982 and hgrecco/pint#918 pinged my recollection of this issue. In short, Pint actually doesn't currently support uncertainties with arrays, only scalars, so my earlier wrapping comment was mistaken. So, moving forward with __array_function__ in Scipp in order to integrate it as a "duck array type" within xarray might be a good way to get physical units and uncertainties working together in xarray.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants