Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast vertical profile diagnostic #186

Closed
ali-ramadhan opened this issue Apr 19, 2019 · 8 comments · Fixed by #352
Closed

Fast vertical profile diagnostic #186

ali-ramadhan opened this issue Apr 19, 2019 · 8 comments · Fixed by #352
Labels
abstractions 🎨 Whatever that means feature 🌟 Something new and shiny
Milestone

Comments

@ali-ramadhan
Copy link
Member

ali-ramadhan commented Apr 19, 2019

We may want vertical profiles of many variables, e.g. u, v, w, T. Would be nice to have a diagnostic that does this efficiently, especially if we have very frequent diagnostics.

If it's literally every iteration then a CUDA kernel might be the way to go. But if it's like every 20-100+ iterations then it might be faster to copy stuff to the CPU and do a lot of extended on-the-fly analysis there (similar to what we do with asynchronous NetCDF output).

Not sure if the same diagnostic can handle products of fields, e.g. w'T'. That could be another diagnostic?

cc @sandreza

@ali-ramadhan ali-ramadhan added feature 🌟 Something new and shiny abstractions 🎨 Whatever that means labels Apr 19, 2019
@ali-ramadhan ali-ramadhan added this to the v1.0 milestone Apr 19, 2019
@glwagner
Copy link
Member

Since mean() works on the GPU, it’s not difficult to implement right?

I feel developing the abstraction so it’s trivial to implement kernels is the way to go. We’ll benefit on many fronts.

The FourierFlows strategy is to provide a tool that writes output from arbitrary functions (paired with a symbol that gives the output a name). Then developers can add functions for common outputs (like TKE, dissipation, covariances), and users can pick and choose what they need flexibly, or write new functions if they need to.

It could also make sense to have a tool that takes kernels as input for our case.

@ali-ramadhan
Copy link
Member Author

This is becoming a high priority issue now. We don't have to worry about using a second core to compute diagnostics yet but should develop some reusable kernels to compute, e.g. vertical profiles, covariances, etc.

Unfortunately taking the horizontal average of a field via mean() is very slow on the GPU (scalar CuArray operations again) as we have to take the mean of data(::Field) or ardata(::Field), which are non-contiguous views into a CuArray.

We must use a non-contiguous view to ignore halos as including halos in the averaging would produce a wrong answer.

This is what I've been doing for now which works okay (no scalar ops) but not great:

function horizontal_avg(model, field)
    function havg(model)
        f = Array(ardata(field))
        return mean(f, dims=[1, 2])[:]
    end
    return havg
end

function horizontal_avg(model, f1, f2)
    function havg(model)
        prod = Array(ardata(f1)) .* Array(ardata(f2))
        return mean(prod, dims=[1, 2])[:]
    end
    return havg
end

and some timings from a 256^3 simulation

[ Info: Calculating JLD2 output Symbol[:T, :w, :kappaT, :v, :nu, :u]...
  1.049922 seconds (4.51 k allocations: 786.251 MiB, 50.95% gc time)
[ Info: Writing JLD2 output Symbol[:T, :w, :kappaT, :v, :nu, :u]...
[ Info: Done writing (t: 6.914 s)
[ Info: Calculating JLD2 output Symbol[:vv, :w, :kappaT, :nu, :S, :wv, :v, :uv, :ww, :u, :uw, :vu, :T, :wT, :wu, :uu, :vw]...
  6.309173 seconds (3.16 k allocations: 4.625 GiB, 52.27% gc time)
[ Info: Writing JLD2 output Symbol[:vv, :w, :kappaT, :nu, :S, :wv, :v, :uv, :ww, :u, :uw, :vu, :T, :wT, :wu, :uu, :vw]...
[ Info: Done writing (t: 3.469 ms)

Shouldn't be hard to code up some simple kernels that do this quickly with minimal allocations.

@glwagner
Copy link
Member

glwagner commented Aug 8, 2019

This script:

https://github.com/glwagner/ColumnModelOptimizationProject/blob/master/les/simple_flux.jl

shows how to use the JLD2OutputWriter to calculate horizontal averages efficiently on the GPU.

edit: I missed your point about not including the halos.

@glwagner
Copy link
Member

glwagner commented Aug 8, 2019

Can you use mean on a view into the parent array on the GPU?

@glwagner
Copy link
Member

glwagner commented Aug 8, 2019

Note also that due to our current convention for fields and indexing, the “correct” horizontal average depends on both the field type and the boundary conditions.

@ali-ramadhan
Copy link
Member Author

Can you use mean on a view into the parent array on the GPU?

Unfortunately no. It's still a non-contiguous view.

julia> using Statistics, CuArrays; C = rand(128, 128, 128) |> CuArray; @time CuArrays.@sync mean(C, dims=[1, 2]);
  0.000575 seconds (221 allocations: 8.797 KiB)

julia> using Statistics, CuArrays; C = rand(128, 128, 128) |> CuArray; @time CuArrays.@sync mean(view(C, 2:127, 2:127, 2:127), dims=[1, 2]);
 18.857220 seconds (10.15 M allocations: 342.416 MiB, 1.11% gc time)

A 33,000x slow down lol.

Note also that due to our current convention for fields and indexing, the “correct” horizontal average depends on both the field type and the boundary conditions.

Ah good point, that would complicate things...

@glwagner
Copy link
Member

glwagner commented Aug 8, 2019

Interesting.

One solution is to compute the mean including the halos, and then adjust the result (on the cpu) taking into account the horizontal boundary conditions + halo values.

@ali-ramadhan
Copy link
Member Author

ali-ramadhan commented Aug 8, 2019

Hmmm, I wonder if it might be easier to contribute a version of mean that works for us.

@maleadt @vchuravy is it possible to efficiently calculate mean(c) where c is a non-contiguous view into a CuArray?

Nevermind, I think it's an open issue in CuArrays.jl: JuliaGPU/CuArrays.jl#68

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abstractions 🎨 Whatever that means feature 🌟 Something new and shiny
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants