Skip to content

FluxML/SafeTensors.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SafeTensors.jl

Build Status

This packages loads data stored in safetensor format. Since Python is row-major and Julia is column-major, the dimensions are permuted such the tensor has the same shape as in python, but everything is correctly ordered. This includes a performance penalty in sense that we cannot be completely copy-free.

The main function is load_safetensors which returns a Dict{String,V} where keys are names of tensors and values are tensors. An example from runtests is as follows

julia> using SafeTensors

julia> d = load_safetensors("test/model.safetensors")
Dict{String, Array} with 27 entries:
  "int32_357"   => Int32[0 7  21 28; 35 42  56 63; 70 77  91 98;;; 1 8  22 29
  "uint8_3"     => UInt8[0x00, 0x01, 0x02]
  "float16_35"  => Float16[0.0 1.0  3.0 4.0; 5.0 6.0  8.0 9.0; 10.0 11.0  13.0
  "bool_3"      => Bool[0, 1, 0]
  "int64_3"     => [0, 1, 2]
  "int64_35"    => [0 1  3 4; 5 6  8 9; 10 11  13 14]
  "float32_357" => Float32[0.0 7.0  21.0 28.0; 35.0 42.0  56.0 63.0; 70.0 77.0 
  "bool_35"     => Bool[0 1  1 0; 1 0  0 1; 0 1  1 0]
  "float32_35"  => Float32[0.0 1.0  3.0 4.0; 5.0 6.0  8.0 9.0; 10.0 11.0  13.0
  "float32_3"   => Float32[0.0, 1.0, 2.0]
  "uint8_35"    => UInt8[0x00 0x01  0x03 0x04; 0x05 0x06  0x08 0x09; 0x0a 0x0b 
  "float16_3"   => Float16[0.0, 1.0, 2.0]
  "int16_357"   => Int16[0 7  21 28; 35 42  56 63; 70 77  91 98;;; 1 8  22 29
  "int16_3"     => Int16[0, 1, 2]
  "float64_357" => [0.0 7.0  21.0 28.0; 35.0 42.0  56.0 63.0; 70.0 77.0  91.0 
  "uint8_357"   => UInt8[0x00 0x07  0x15 0x1c; 0x23 0x2a  0x38 0x3f; 0x46 0x4d 
  "float16_357" => Float16[0.0 7.0  21.0 28.0; 35.0 42.0  56.0 63.0; 70.0 77.0 
  "int32_3"     => Int32[0, 1, 2]
  "int16_35"    => Int16[0 1  3 4; 5 6  8 9; 10 11  13 14]
  "int8_357"    => Int8[0 7  21 28; 35 42  56 63; 70 77  91 98;;; 1 8  22 29;
  "int8_35"     => Int8[0 1  3 4; 5 6  8 9; 10 11  13 14]
  "bool_357"    => Bool[0 1  1 0; 1 0  0 1; 0 1  1 0;;; 1 0  0 1; 0 1  1 0; 
  "float64_35"  => [0.0 1.0  3.0 4.0; 5.0 6.0  8.0 9.0; 10.0 11.0  13.0 14.0]
  "int8_3"      => Int8[0, 1, 2]
  "int64_357"   => [0 7  21 28; 35 42  56 63; 70 77  91 98;;; 1 8  22 29; 36 
  "int32_35"    => Int32[0 1  3 4; 5 6  8 9; 10 11  13 14]
  "float64_3"   => [0.0, 1.0, 2.0]

It can also perform a lazy loading with SafeTensors.deserialize("model.safetensors") which mmap the file and return a Dict-like object:

julia> tensors = SafeTensors.deserialize("test/model.safetensors"; mmap = true #= default to `true`=#);

julia> tensors["float32_35"]
3×5 mappedarray(ltoh, PermutedDimsArray(reshape(reinterpret(Float32, view(::Vector{UInt8}, 0x0000000000000ef5:0x0000000000000f30)), 5, 3), (2, 1))) with eltype Float32:
  0.0   1.0   2.0   3.0   4.0
  5.0   6.0   7.0   8.0   9.0
 10.0  11.0  12.0  13.0  14.0

Serialization is also supported:

julia> using Random, BFloat16s

julia> weights = Dict("W"=>randn(BFloat16, 3, 5), "b"=>rand(BFloat16, 3))
Dict{String, Array{BFloat16}} with 2 entries:
  "W" => [0.617188 0.695312  0.390625 -2.0; -0.65625 -0.617188  0.652344 0.244141; 0.226562 2.70312  -0.174805 -0.7773
  "b" => [0.111816, 0.566406, 0.283203]

julia> f = tempname();

julia> SafeTensors.serialize(f, weights)

julia> loaded = SafeTensors.deserialize(f);

julia> loaded["W"]  weights["W"]
true

julia> SafeTensors.serialize(f, weights, Dict("Package"=>"SafeTensors.jl", "version"=>"1"))

julia> loaded = SafeTensors.deserialize(f);

julia> loaded.metadata
Dict{String, String} with 2 entries:
  "Package" => "SafeTensors.jl"
  "version" => "1"

Working with gpu:

julia> loaded["W"]
3×5 mappedarray(ltoh, PermutedDimsArray(reshape(reinterpret(BFloat16, view(::Vector{UInt8}, 0x00000000000000b9:0x00000000000000d6)), 5, 3), (2, 1))) with eltype BFloat16:
  0.542969    0.201172   1.38281    -0.255859  -1.55469
  0.172852   -0.949219   0.0561523  -1.34375   -0.206055
 -0.0854492   1.17969   -0.265625   -0.871094   2.25

julia> using CUDA; CUDA.allowscalar(false)

julia> CuArray(loaded["W"])
3×5 CuArray{BFloat16, 2, CUDA.Mem.DeviceBuffer}:
  0.542969    0.201172   1.38281    -0.255859  -1.55469
  0.172852   -0.949219   0.0561523  -1.34375   -0.206055
 -0.0854492   1.17969   -0.265625   -0.871094   2.25

julia> gpu_weights = Dict("W"=>CuArray(loaded["W"]), "b"=>CuArray(loaded["b"]))
Dict{String, CuArray{BFloat16, N, CUDA.Mem.DeviceBuffer} where N} with 2 entries:
  "W" => [0.542969 0.201172  -0.255859 -1.55469; 0.172852 -0.949219  -1.34375 -0.206055; -0.0854492 1.17969  -0.871094
  "b" => BFloat16[0.871094, 0.773438, 0.703125]

julia> f = tempname();

julia> SafeTensors.serialize(f, gpu_weights)

julia> SafeTensors.deserialize(f)
SafeTensors.SafeTensor{SubArray{UInt8, 1, Vector{UInt8}, Tuple{UnitRange{UInt64}}, true}} with 2 entries:
  "W" => BFloat16[0.542969 0.201172  -0.255859 -1.55469; 0.172852 -0.949219  -1.34375 -0.206055; -0.0854492 1.17969  -
  "b" => BFloat16[0.871094, 0.773438, 0.703125]