Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yaxarrays slower than dimensionaldata #361

Open
bjarthur opened this issue Jan 19, 2024 · 0 comments
Open

yaxarrays slower than dimensionaldata #361

bjarthur opened this issue Jan 19, 2024 · 0 comments
Labels
checkvalidity test needed We should add a test to catch this in the future.

Comments

@bjarthur
Copy link

bjarthur commented Jan 19, 2024

it's slower when converting yax to dd:

julia> using YAXArrays, YAXArrayBase, DimensionalData, BenchmarkTools

julia> yax = YAXArray(rand(10, 20, 5));

julia> dd = yaxconvert(DimArray, yax);

julia> @benchmark yax[Dim_1=1:3]
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.059 μs … 190.583 μs  ┊ GC (min … max): 0.00% … 95.98%
 Time  (median):     4.137 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.303 μs ±   4.106 μs  ┊ GC (mean ± σ):  2.28% ±  2.34%

  ▂▇██▇▅▃▁                   ▁                                ▂
  █████████▆▆▄▄▅▃▃▂▃▂▂▅▇████████▇▇▆▅▅▅▄▅▄▅▄▃▄▆▅▆▆▆▆▇▆▄▄▅▅▄▄▅▆ █
  4.06 μs      Histogram: log(frequency) by time       5.5 μs <

 Memory estimate: 4.88 KiB, allocs estimate: 87.

julia> @benchmark dd[Dim_1=1:3]
BenchmarkTools.Trial: 10000 samples with 313 evaluations.
 Range (min … max):  269.834 ns …   8.516 μs  ┊ GC (min … max):  0.00% … 96.06%
 Time  (median):     369.808 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   489.908 ns ± 878.050 ns  ┊ GC (mean ± σ):  24.04% ± 12.50%

  █▇                                                            ▁
  ███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆██ █
  270 ns        Histogram: log(frequency) by time        6.9 μs <

 Memory estimate: 2.59 KiB, allocs estimate: 2.

as well as converting dd to yax:

julia> DD = DimArray(rand(50, 31), (X(), Y(10.0:40.0)), metadata = Dict{String, Any}());

julia> YAX = yaxconvert(YAXArray, DD)
50×31 YAXArray{Float64,2} with dimensions: 
  Dim{:X},
  Dim{:Y} Sampled{Float64} 10.0:1.0:40.0 ForwardOrdered Regular Points
Total size: 12.11 KB


julia> @benchmark DD[Y(1:10), X(1)]
BenchmarkTools.Trial: 10000 samples with 991 evaluations.
 Range (min … max):  41.751 ns … 407.417 ns  ┊ GC (min … max): 0.00% … 85.99%
 Time  (median):     42.592 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   44.705 ns ±  16.278 ns  ┊ GC (mean ± σ):  2.39% ±  5.63%

  ▃▆█▇▄▂        ▂▂▂▂▁                                          ▁
  ███████▆▅▄▄▃▄███████▆▆▆▆▅▆▆▆▇▇▇█▇▇█▇▇▆▅▆▆▅▆▆▅▆▆▆▅▅▅▄▅▄▄▄▄▄▅▅ █
  41.8 ns       Histogram: log(frequency) by time      59.5 ns <

 Memory estimate: 240 bytes, allocs estimate: 2.

julia> @benchmark YAX[Dim{:Y}(1:10), Dim{:X}(1)]
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.366 μs … 187.162 μs  ┊ GC (min … max): 0.00% … 97.36%
 Time  (median):     2.431 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.572 μs ±   4.234 μs  ┊ GC (mean ± σ):  3.97% ±  2.38%

  ▁▆██▇▆▄▂▂                   ▁ ▁▁                            ▂
  ██████████▆▆▄▅▁▃▁▃▃▁▁▁▄▅▇▇██████▇▇▇▆▄▆▆▄▃▃▄▁▆▅▄▅▄▃▅▅▆▆▆▆▆▄▅ █
  2.37 μs      Histogram: log(frequency) by time      3.42 μs <

 Memory estimate: 2.92 KiB, allocs estimate: 39.

that's a 10-fold difference for the above arrays which are small and in memory. but even for a 450MB on-disk zarr array, yax is still 20% slower than dd:

julia> using Zarr

julia> yax = Cube("foo.zarr");

julia> dd = yaxconvert(DimArray, yax);

julia> @benchmark collect(yax[Dim{:LI}(At("bar"))])
BenchmarkTools.Trial: 73 samples with 1 evaluation.
 Range (min … max):  52.840 ms … 124.095 ms  ┊ GC (min … max):  3.18% … 58.09%
 Time  (median):     54.923 ms               ┊ GC (median):     5.83%
 Time  (mean ± σ):   68.719 ms ±  25.640 ms  ┊ GC (mean ± σ):  24.69% ± 20.59%

  ▂█                                                            
  ██▇▄▁▁▅▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▃▃▃▃▁▄▃▃ ▁
  52.8 ms         Histogram: frequency by time          121 ms <

 Memory estimate: 126.95 MiB, allocs estimate: 10584.

julia> @benchmark collect(dd[Dim{:LI}(At("bar"))])
BenchmarkTools.Trial: 110 samples with 1 evaluation.
 Range (min … max):  44.108 ms … 107.490 ms  ┊ GC (min … max): 0.00% … 58.55%
 Time  (median):     45.175 ms               ┊ GC (median):    1.25%
 Time  (mean ± σ):   45.998 ms ±   6.025 ms  ┊ GC (mean ± σ):  2.60% ±  5.66%

         ▄█                                                     
  ▅▄▆▃▆▄█████▃▃▁▃▃▄▄▃▃▃▁▁▁▃▁▁▁▃▃▁▃▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
  44.1 ms         Histogram: frequency by time         51.9 ms <

 Memory estimate: 38.41 MiB, allocs estimate: 2969.

julia> size(yax)
(20222, 1098, 145)

is this expected?

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
  Threads: 1 on 8 virtual cores
Environment:
  JULIA_PROJECT = @.
  JULIA_EDITOR = vi

DimensionalData v0.25.8 and YAXArrays v0.5.2

@bjarthur bjarthur changed the title yaxarrays 10x slower than dimensionaldata yaxarrays slower than dimensionaldata Jan 19, 2024
@lazarusA lazarusA added test needed We should add a test to catch this in the future. checkvalidity labels Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checkvalidity test needed We should add a test to catch this in the future.
Projects
None yet
Development

No branches or pull requests

2 participants