-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalability of the uncertainty propagation in numpy arrays #57
Comments
I did not have time to read your whole post yet, but the idea of having a dedicated array type for optimization purposes deserves attention, as does the idea of storing separately the nominal value and the random part (not the standard deviation, which is not stored internally). This is definitely the way of obtaining faster NumPy calculations. It'd be great if you work on this. But please call the new array class Please note the following points:
In addition to the issues you already cite, the discussion in issue #19 is relevant. |
Thank you for the feedback and additional suggestions, I'll look into it and let you know of the progress. Edit: yes the benchmark above uses the latest 3.0 version |
Where did this end up going (and what were the complications)? It seems a fairly straightforward proposal from afar. |
If I remember correctly, the general idea was to subclass The documentation on subclassing numpy arrays is currently extensive enough bit IMO this is still non-trivial. In any case, I later discovered that there was some prior work in this direction in astropy/astropy#3715 The author also offered to merge some of the features back to The current status is that I don't really have the bandwidth to work on this anymore, unfortunately. |
This seems like it could be a great project. I am trying to calculate the pseudo inverse with uncertainties for an array size (21931, 3). Using np.linalg.pinv (without uncertainties) this takes milliseconds. Using unumpy.ulinalg.pinv (with uncertainties) this does not complete within one half hour. The project as it stands is not very practical. Are there any plans to improve this? |
I guess you mean not practical for people who need the pseudo inverse of a matrix with many (60,000) coefficients? The uncertainties package is indeed not designed for such large matrices: when I wrote it, I had in mind simple expressions from physics, which typically have only a few variables. Concretely, the pseudo-inverse is a matrix of 60,000 coefficients, each represented by their dependence on another 60,000 coefficients: we are talking about 360 million parameters, which is sizeable. Maybe the auto-differentiation tools from PyTorch or TensorFlow could help with your specific problem: this would be the first thing I would look at. If they solve the problem, then it would indeed be an interesting project to massively speed up uncertainties by using them (maybe in a completely different package). I cannot currently plan to look at this any soon, but that's an old idea that is worth checking. PS: of course the other option would be to see what can be done simply with NumPy, as discussed above, but it's likely to not be as fast and why reinvent the wheel when we may not have to? |
The problem I am trying to solve is AX = b. A is only 3 wide so x is a vector of length three. So there are only 3 unknowns. And like I said, this only take a few milliseconds using np.linalg.pinv (without the uncertainties). As the author of this issue noted, something is not being handled correctly. But thanks for the suggestion, maybe if I think about error propagation I can solve this using one of the automated differentiation packages. I think jax does this too. |
Just want to add that the solution I came up with is to assume a distributions for the errors of the data, add noise to the data following that distribution and then run the calculation (inversion and then dot product) a thousand times. Doing so the calculation takes just 4 seconds and out pops the full error distribution of the parameters. |
In that case, you might want to look at mcerp which does Monte Carlo error propagation (though that hasn't been updated in a while either). |
There is also https://github.com/SALib/SALib which allows for more sophisticated UQ/SA, it handles sampling and outputs while you'd handle the model part. And https://github.com/idaholab/raven if you want something more complex |
Statement of the problem
Currently uncertainties supports numpy arrays by stacking
uncertainties.ufloat
objects inside anumpy.array( , dtype="object")
array. This is certainty nice as it allows to automatically use uncertainty propagation with all existingnumpy.arrays
operators. However this also poses significant performance limitations, as the logic of error propagation is needlessly repeatedly calculated for every element of the array.Here is a quick benchmark for the current implementation (v0.3),
while some decrease in performance is of course expected for the error propagation, currently for simple array operations the performance is as follows,
(1000, 1000)
matrix multiplication.Proposed solution
I believe that making the
unumpy.uarray
return a customundaray
object (and not anp.array(.., dtype='object')
) which would store the mean value and standard deviation in 2 numpy arrays e.g.undarray.n
andundarray.s
, then implementing the logic of error propagation at the array level would address both the memory usage and performance issues.The way masked arrays , also constructed from 2 numpy arrays, are implemented as a class inheriting from
ndarray
, could be used as an example.Possible impact
This also goes along with Issue #47 , as if operators are defined as methods of this
undarray
object with a numpy compatible API, the existing code with numpy operators (e.g.np.sum( undarray )
) might just work out of the box [needs confirmation].Might affect issue #53.
In order to keep backward compatibility, it could be possible to keep the current behavior with
then switch to this new backend with
This would require significant work to make all the operators work, and at first only a subset of operators may be supported, but I believe that performance improvement for
unumpy
would help a lot for this package to be used in any medium scale or production applications.I would be happy to work on this. What do you think @lebigot ?
The text was updated successfully, but these errors were encountered: