Long time to create large arrays #129

juliotux · 2020-10-21T15:00:50Z

uncertainties seems to have a very large performance impact when creating mid to large size arrays. Like, for creating a 2048x2048 array with uncertainties, common in astronomical images, it took around 24s for each array creation. According the profiler, the impact is basically due to the UFloat instances creation:

ipython:  %timeit u1 = unp.uarray(np.ones((2048, 2048)), np.ones((2048, 2048))*0.01)

...

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  4194304   40.481    0.000   54.901    0.000 core.py:2712(__init__)
  4194304    6.032    0.000   60.933    0.000 core.py:295(<lambda>)
  4194304    4.643    0.000    4.643    0.000 core.py:2762(std_dev)
  4194304    3.294    0.000    3.294    0.000 core.py:1631(__init__)
  4194304    2.994    0.000    4.352    0.000 core.py:2791(__hash__)
        1    2.949    2.949   64.731   64.731 function_base.py:2179(_vectorize_call)
  4194304    2.130    0.000    2.130    0.000 core.py:1498(__init__)
  4194304    1.358    0.000    1.358    0.000 {built-in method builtins.id}
        3    0.849    0.283    0.849    0.283 {built-in method numpy.array}
        1    0.153    0.153   64.884   64.884 function_base.py:2080(__call__)
        1    0.105    0.105   64.989   64.989 <string>:1(<module>)
        1    0.000    0.000   64.989   64.989 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 function_base.py:2048(__init__)
        1    0.000    0.000   64.884   64.884 core.py:272(uarray)
        1    0.000    0.000    0.849    0.849 function_base.py:2189(<listcomp>)
        1    0.000    0.000    0.000    0.000 function_base.py:2110(_get_ufunc_and_otypes)
        1    0.000    0.000    0.000    0.000 {built-in method numpy.frompyfunc}
        1    0.000    0.000    0.000    0.000 function_base.py:2065(<listcomp>)

There is any way to handle uncertainties arrays without pass by these overhead? Is subclassing numpy ndarray instead of just creating a ndarray of UFloats a viable workaround to speedup the code?

The text was updated successfully, but these errors were encountered:

lebigot · 2020-10-21T17:02:32Z

Thank you for the feedback. "Large" arrays with uncertainties are not fast, with this package.

There is a similar issue on the subject, so I'll close this one after this comment, but feel free to re-open it: #57. Handling fully-correlated uncertainties in arrays in a fast way requires some thinking. For example, if you invert a 4 million element matrix (like in your example), each of the 4 million elements depends on the 4 million other ones in a specific way: that's a huge amount of data (of the order of a terabyte), and therefore requires a lot of computations.

One option might (to be defined more precisely) would be to handle separately and in fast way some special cases like yours (initialization) and some simple operations (that result in each array element depending only on a few variables).

Any idea is welcome, at this stage (probably as comments in the other issue I was linking to). Thanks!

lebigot closed this as completed Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long time to create large arrays #129

Long time to create large arrays #129

juliotux commented Oct 21, 2020

lebigot commented Oct 21, 2020 •

edited

Loading

Long time to create large arrays #129

Long time to create large arrays #129

Comments

juliotux commented Oct 21, 2020

lebigot commented Oct 21, 2020 • edited Loading

lebigot commented Oct 21, 2020 •

edited

Loading