Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long time to create large arrays #129

Closed
juliotux opened this issue Oct 21, 2020 · 1 comment
Closed

Long time to create large arrays #129

juliotux opened this issue Oct 21, 2020 · 1 comment

Comments

@juliotux
Copy link

uncertainties seems to have a very large performance impact when creating mid to large size arrays. Like, for creating a 2048x2048 array with uncertainties, common in astronomical images, it took around 24s for each array creation. According the profiler, the impact is basically due to the UFloat instances creation:

ipython:  %timeit u1 = unp.uarray(np.ones((2048, 2048)), np.ones((2048, 2048))*0.01)

...

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  4194304   40.481    0.000   54.901    0.000 core.py:2712(__init__)
  4194304    6.032    0.000   60.933    0.000 core.py:295(<lambda>)
  4194304    4.643    0.000    4.643    0.000 core.py:2762(std_dev)
  4194304    3.294    0.000    3.294    0.000 core.py:1631(__init__)
  4194304    2.994    0.000    4.352    0.000 core.py:2791(__hash__)
        1    2.949    2.949   64.731   64.731 function_base.py:2179(_vectorize_call)
  4194304    2.130    0.000    2.130    0.000 core.py:1498(__init__)
  4194304    1.358    0.000    1.358    0.000 {built-in method builtins.id}
        3    0.849    0.283    0.849    0.283 {built-in method numpy.array}
        1    0.153    0.153   64.884   64.884 function_base.py:2080(__call__)
        1    0.105    0.105   64.989   64.989 <string>:1(<module>)
        1    0.000    0.000   64.989   64.989 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 function_base.py:2048(__init__)
        1    0.000    0.000   64.884   64.884 core.py:272(uarray)
        1    0.000    0.000    0.849    0.849 function_base.py:2189(<listcomp>)
        1    0.000    0.000    0.000    0.000 function_base.py:2110(_get_ufunc_and_otypes)
        1    0.000    0.000    0.000    0.000 {built-in method numpy.frompyfunc}
        1    0.000    0.000    0.000    0.000 function_base.py:2065(<listcomp>)

There is any way to handle uncertainties arrays without pass by these overhead? Is subclassing numpy ndarray instead of just creating a ndarray of UFloats a viable workaround to speedup the code?

@lebigot
Copy link
Collaborator

lebigot commented Oct 21, 2020

Thank you for the feedback. "Large" arrays with uncertainties are not fast, with this package.

There is a similar issue on the subject, so I'll close this one after this comment, but feel free to re-open it: #57. Handling fully-correlated uncertainties in arrays in a fast way requires some thinking. For example, if you invert a 4 million element matrix (like in your example), each of the 4 million elements depends on the 4 million other ones in a specific way: that's a huge amount of data (of the order of a terabyte), and therefore requires a lot of computations.

One option might (to be defined more precisely) would be to handle separately and in fast way some special cases like yours (initialization) and some simple operations (that result in each array element depending only on a few variables).

Any idea is welcome, at this stage (probably as comments in the other issue I was linking to). Thanks!

@lebigot lebigot closed this as completed Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants