Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vload on Matrix{Float64} #116

Closed
algorithmx opened this issue Aug 17, 2023 · 3 comments
Closed

vload on Matrix{Float64} #116

algorithmx opened this issue Aug 17, 2023 · 3 comments

Comments

@algorithmx
Copy link

algorithmx commented Aug 17, 2023

Dear developers,

I need to access a Julia matrix M of size (4,ncol) via vload function in a performance-critical situation, where converting the matrix M to a vector by vec(M) is prohibited. How can I properly use vload to access the k-th column of M, given that this matrix is properly padded (i.e. all(M[4,:] .== 0) == true )? If I cannot do this, from where can I extent the vload function?

#feature-request

@KristofferC
Copy link
Collaborator

Maybe:

julia> function load_column(x, i)
           GC.@preserve x begin
               ptr = pointer(x, 4*(i-1)+1)
               SIMD.vload(Vec{4, Float64}, ptr)
           end
       end
load_column (generic function with 1 method)

julia> x = rand(4, 10)
4×10 Matrix{Float64}:
 0.105425  0.112207   0.164943  0.924851     0.962439  0.771419   0.232943  0.0477578  0.31495   0.725615
 0.583358  0.574841   0.682031  0.0163499    0.889354  0.810443   0.343777  0.372169   0.841392  0.0536699
 0.831974  0.78172    0.424055  0.000215734  0.176126  0.177913   0.842223  0.706929   0.837163  0.591129
 0.567482  0.0297791  0.552873  0.123447     0.485557  0.0712733  0.925538  0.928424   0.719696  0.236411

julia> load_column(x, 1)
<4 x Float64>[0.10542510183253728, 0.5833583593228673, 0.8319741449401951, 0.5674824353491745]

julia> load_column(x, 5)
<4 x Float64>[0.9624389422624501, 0.8893543320414525, 0.17612558720563265, 0.4855567380042497]

@algorithmx
Copy link
Author

Maybe:

julia> function load_column(x, i)
           GC.@preserve x begin
               ptr = pointer(x, 4*(i-1)+1)
               SIMD.vload(Vec{4, Float64}, ptr)
           end
       end
load_column (generic function with 1 method)

julia> x = rand(4, 10)
4×10 Matrix{Float64}:
 0.105425  0.112207   0.164943  0.924851     0.962439  0.771419   0.232943  0.0477578  0.31495   0.725615
 0.583358  0.574841   0.682031  0.0163499    0.889354  0.810443   0.343777  0.372169   0.841392  0.0536699
 0.831974  0.78172    0.424055  0.000215734  0.176126  0.177913   0.842223  0.706929   0.837163  0.591129
 0.567482  0.0297791  0.552873  0.123447     0.485557  0.0712733  0.925538  0.928424   0.719696  0.236411

julia> load_column(x, 1)
<4 x Float64>[0.10542510183253728, 0.5833583593228673, 0.8319741449401951, 0.5674824353491745]

julia> load_column(x, 5)
<4 x Float64>[0.9624389422624501, 0.8893543320414525, 0.17612558720563265, 0.4855567380042497]

Thanks @KristofferC, problem solved! The performance difference is narrow for the two versions of my code, one with flattened matrix and the other with the original matrix plus your suggestion for load_column(x, i). Here is the code I am working on and the benchmark on a 4x10 matrix.

Screenshot from 2023-08-18 15-15-10
Screenshot from 2023-08-18 15-15-53

@algorithmx
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants