Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BERT pretrain example not working #99

Open
jackn11 opened this issue Jul 4, 2022 · 15 comments
Open

BERT pretrain example not working #99

jackn11 opened this issue Jul 4, 2022 · 15 comments

Comments

@jackn11
Copy link

jackn11 commented Jul 4, 2022

I tried running the BERT example found here https://github.com/chengchingwen/Transformers.jl/blob/master/example/BERT/_pretrain/pretrain.jl but I got the following error on the train!() line at the bottom of the code.

[ Info: loading pretrain bert model: uncased_L-12_H-768_A-12.tfbson wordpiece
[ Info: loading pretrain bert model: uncased_L-12_H-768_A-12.tfbson tokenizer
[ Info: loading pretrain bert model: uncased_L-12_H-768_A-12.tfbson bert_model
[ Info: start training
[ Info: epoch: 1
ERROR: LoadError: GPU compilation of kernel #broadcast_kernel#15(CUDA.CuKernelContext, CUDA.CuDeviceArray{Float32, 4, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, NTuple{4, Base.OneTo{Int64}}, typeof(+), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 4, 1}, NTuple{4, Bool}, NTuple{4, Int64}}, Base.Broadcast.Extruded{Array{Float32, 4}, NTuple{4, Bool}, NTuple{4, Int64}}}}, Int64) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, NTuple{4, Base.OneTo{Int64}}, typeof(+), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 4, 1}, NTuple{4, Bool}, NTuple{4, Int64}}, Base.Broadcast.Extruded{Array{Float32, 4}, NTuple{4, Bool}, NTuple{4, Int64}}}}, which is not isbits:
  .args is of type Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 4, 1}, NTuple{4, Bool}, NTuple{4, Int64}}, Base.Broadcast.Extruded{Array{Float32, 4}, NTuple{4, Bool}, NTuple{4, Int64}}} which is not isbits.
    .2 is of type Base.Broadcast.Extruded{Array{Float32, 4}, NTuple{4, Bool}, NTuple{4, Int64}} which is not isbits.
      .x is of type Array{Float32, 4} which is not isbits.


Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler C:\Users\jackn\.julia\packages\GPUCompiler\iaKrd\src\validation.jl:86
  [2] macro expansion
    @ C:\Users\jackn\.julia\packages\GPUCompiler\iaKrd\src\driver.jl:413 [inlined]
  [3] macro expansion
    @ C:\Users\jackn\.julia\packages\TimerOutputs\jgSVI\src\TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ C:\Users\jackn\.julia\packages\GPUCompiler\iaKrd\src\driver.jl:412 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler C:\Users\jackn\.julia\packages\GPUCompiler\iaKrd\src\utils.jl:64
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA C:\Users\jackn\.julia\packages\CUDA\tTK8Y\src\compiler\execution.jl:354
  [7] #224
    @ C:\Users\jackn\.julia\packages\CUDA\tTK8Y\src\compiler\execution.jl:347 [inlined]
  [8] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#15", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float32, 4, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, NTuple{4, Base.OneTo{Int64}}, typeof(+), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 4, 1}, NTuple{4, Bool}, NTuple{4, Int64}}, Base.Broadcast.Extruded{Array{Float32, 4}, NTuple{4, Bool}, NTuple{4, Int64}}}}, Int64}}}})
    @ GPUCompiler C:\Users\jackn\.julia\packages\GPUCompiler\iaKrd\src\driver.jl:74
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA C:\Users\jackn\.julia\packages\CUDA\tTK8Y\src\compiler\execution.jl:346
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler C:\Users\jackn\.julia\packages\GPUCompiler\iaKrd\src\cache.jl:90
 [11] cufunction(f::GPUArrays.var"#broadcast_kernel#15", tt::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float32, 4, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, NTuple{4, Base.OneTo{Int64}}, typeof(+), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 4, 1}, NTuple{4, Bool}, NTuple{4, Int64}}, Base.Broadcast.Extruded{Array{Float32, 4}, NTuple{4, Bool}, NTuple{4, Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA C:\Users\jackn\.julia\packages\CUDA\tTK8Y\src\compiler\execution.jl:299
 [12] cufunction
    @ C:\Users\jackn\.julia\packages\CUDA\tTK8Y\src\compiler\execution.jl:293 [inlined]
 [13] macro expansion
    @ C:\Users\jackn\.julia\packages\CUDA\tTK8Y\src\compiler\execution.jl:102 [inlined]
 [14] #launch_heuristic#248
    @ C:\Users\jackn\.julia\packages\CUDA\tTK8Y\src\gpuarrays.jl:17 [inlined]
 [15] _copyto!
    @ C:\Users\jackn\.julia\packages\GPUArrays\EVTem\src\host\broadcast.jl:73 [inlined]
 [16] copyto!
    @ C:\Users\jackn\.julia\packages\GPUArrays\EVTem\src\host\broadcast.jl:56 [inlined]
 [17] copy
    @ C:\Users\jackn\.julia\packages\GPUArrays\EVTem\src\host\broadcast.jl:47 [inlined]
 [18] materialize
    @ .\broadcast.jl:860 [inlined]
 [19] apply_mask(score::CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, mask::Array{Float32, 3})
    @ Transformers.Basic C:\Users\jackn\.julia\packages\Transformers\K1F88\src\basic\mh_atten.jl:182
 [20] apply_mask
    @ C:\Users\jackn\.julia\packages\Transformers\K1F88\src\basic\mh_atten.jl:188 [inlined]
 [21] attention(query::CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, key::CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, value::CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, mask::Array{Float32, 3}, future::Bool, dropout::Dropout{Float64, Colon, CUDA.RNG})
    @ Transformers.Basic C:\Users\jackn\.julia\packages\Transformers\K1F88\src\basic\mh_atten.jl:204
 [22] (::Transformers.Basic.MultiheadAttention{Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dropout{Float64, Colon, CUDA.RNG}})(query::CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, key::CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, value::CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}; mask::Array{Float32, 3})
    @ Transformers.Basic C:\Users\jackn\.julia\packages\Transformers\K1F88\src\basic\mh_atten.jl:102
 [23] (::Transformer{Transformers.Basic.MultiheadAttention{Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dropout{Float64, Colon, CUDA.RNG}}, LayerNorm{typeof(identity), Flux.Scale{typeof(identity), CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Float32, 1}, Transformers.Basic.PwFFN{Dense{typeof(gelu), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, LayerNorm{typeof(identity), Flux.Scale{typeof(identity), CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Float32, 1}, Dropout{Float64, Colon, CUDA.RNG}})(x::CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, mask::Array{Float32, 3})
    @ Transformers.Basic C:\Users\jackn\.julia\packages\Transformers\K1F88\src\basic\transformer.jl:69
 [24] macro expansion
    @ C:\Users\jackn\.julia\packages\Transformers\K1F88\src\stacks\stack.jl:0 [inlined]
 [25] (::Stack{Symbol("((x, m) => x':(x, m)) => 12"), NTuple{12, Transformer{Transformers.Basic.MultiheadAttention{Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dropout{Float64, Colon, CUDA.RNG}}, LayerNorm{typeof(identity), Flux.Scale{typeof(identity), CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Float32, 1}, Transformers.Basic.PwFFN{Dense{typeof(gelu), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, LayerNorm{typeof(identity), Flux.Scale{typeof(identity), CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Float32, 1}, Dropout{Float64, Colon, CUDA.RNG}}}})(::CUDA.CuArray{Float32, 3, 
CUDA.Mem.DeviceBuffer}, ::Array{Float32, 3})
    @ Transformers.Stacks C:\Users\jackn\.julia\packages\Transformers\K1F88\src\stacks\stack.jl:19
 [26] (::Bert{Stack{Symbol("((x, m) => x':(x, m)) => 12"), NTuple{12, Transformer{Transformers.Basic.MultiheadAttention{Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, 
CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 
1, CUDA.Mem.DeviceBuffer}}, Dropout{Float64, Colon, CUDA.RNG}}, LayerNorm{typeof(identity), Flux.Scale{typeof(identity), CUDA.CuArray{Float32, 1, 
CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Float32, 1}, Transformers.Basic.PwFFN{Dense{typeof(gelu), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, LayerNorm{typeof(identity), Flux.Scale{typeof(identity), CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Float32, 1}, Dropout{Float64, Colon, CUDA.RNG}}}}, Dropout{Float64, Colon, CUDA.RNG}})(x::CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, mask::Array{Float32, 3}; all::Bool)
    @ Transformers.BidirectionalEncoder C:\Users\jackn\.julia\packages\Transformers\K1F88\src\bert\bert.jl:55
 [27] Bert
    @ C:\Users\jackn\.julia\packages\Transformers\K1F88\src\bert\bert.jl:50 [inlined]
 [28] loss(data::NamedTuple{(:tok, :segment), Tuple{Matrix{Int64}, Matrix{Int64}}}, ind::Vector{Tuple{Int64, Int64}}, masklabel::Flux.OneHotArray{UInt32, 30522, 1, 2, Vector{UInt32}}, nextlabel::Flux.OneHotArray{UInt32, 2, 1, 2, Vector{UInt32}}, mask::Array{Float32, 3})
@chengchingwen chengchingwen changed the title BERT example not working BERT pretrain example not working Jul 4, 2022
@chengchingwen
Copy link
Owner

Out of curiosity, you want to pre-train your own Bert model?

@jackn11
Copy link
Author

jackn11 commented Jul 4, 2022

No, I simply want to fine tune an existing Bert model for use in categorization task. I would appreciate guidance in how to do that as I have not been able to figure it out with this package.

@chengchingwen
Copy link
Owner

@chengchingwen
Copy link
Owner

Or you want to do token level classification?

@jackn11
Copy link
Author

jackn11 commented Jul 4, 2022

Thank you, I will take a look at that link! My current project is classifying sentences into a few given categories. I am not quite sure what you mean by token level classifications? If you mean classifying parts of sentences as tokens, for example finding nouns, verbs, etc., then I am not trying to implement token level classification.

@chengchingwen
Copy link
Owner

If you mean classifying parts of sentences as tokens, for example finding nouns, verbs, etc.,

Yes, that's what I means.

My current project is classifying sentences into a few given categories.

Then you can just follow the link. CoLA is also a sentence classification dataset.

@jackn11
Copy link
Author

jackn11 commented Jul 5, 2022

I got the following error when trying to run the cola example as you recommended. It seems like it may be an issue with the example.

PS C:\Users\jackn\Documents\GitHub\GitHub2\Transformers.jl\example\BERT> julia --proj -i main.jl --gpu cola
[ Info: loading pretrain bert model: uncased_L-12_H-768_A-12.tfbson 
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.    
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.7.3 (2022-05-06)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release  
|__/                   |

julia> train!()
[ Info: start training: cola
[ Info: epoch: 1
ERROR: ArgumentError: 'CoLA.zip' exists. `force=true` is required to remove 'CoLA.zip' before moving.
Stacktrace:
  [1] checkfor_mv_cp_cptree(src::String, dst::String, txt::String; force::Bool)
    @ Base.Filesystem .\file.jl:325
  [2] #mv#17
    @ .\file.jl:411 [inlined]
  [3] mv
    @ .\file.jl:411 [inlined]
  [4] (::Transformers.Datasets.GLUE.var"#1#2")(fn::String)
    @ Transformers.Datasets.GLUE C:\Users\jackn\.julia\packages\Transformers\K1F88\src\datasets\glue\cola.jl:11
  [5] #16
    @ C:\Users\jackn\.julia\packages\DataDeps\jDkzU\src\resolution_automatic.jl:122 [inlined]
  [6] cd(f::DataDeps.var"#16#17"{Transformers.Datasets.GLUE.var"#1#2", String}, dir::String)
    @ Base.Filesystem .\file.jl:99
  [7] run_post_fetch(post_fetch_method::Transformers.Datasets.GLUE.var"#1#2", fetched_path::String)
    @ DataDeps C:\Users\jackn\.julia\packages\DataDeps\jDkzU\src\resolution_automatic.jl:119
  [8] download(datadep::DataDeps.DataDep{String, String, typeof(DataDeps.fetch_default), Transformers.Datasets.GLUE.var"#1#2"}, localdir::String; remotepath::String, i_accept_the_terms_of_use::Nothing, skip_checksum::Bool)
    @ DataDeps C:\Users\jackn\.julia\packages\DataDeps\jDkzU\src\resolution_automatic.jl:84
  [9] download
    @ C:\Users\jackn\.julia\packages\DataDeps\jDkzU\src\resolution_automatic.jl:70 [inlined]
 [10] handle_missing
    @ C:\Users\jackn\.julia\packages\DataDeps\jDkzU\src\resolution_automatic.jl:10 [inlined]
 [11] _resolve
    @ C:\Users\jackn\.julia\packages\DataDeps\jDkzU\src\resolution.jl:83 [inlined]
 [12] resolve(datadep::DataDeps.DataDep{String, String, typeof(DataDeps.fetch_default), Transformers.Datasets.GLUE.var"#1#2"}, inner_filepath::String, calling_filepath::String)
    @ DataDeps C:\Users\jackn\.julia\packages\DataDeps\jDkzU\src\resolution.jl:29
 [13] resolve(datadep_name::String, inner_filepath::String, calling_filepath::String)
    @ DataDeps C:\Users\jackn\.julia\packages\DataDeps\jDkzU\src\resolution.jl:54
 [14] resolve
    @ C:\Users\jackn\.julia\packages\DataDeps\jDkzU\src\resolution.jl:73 [inlined]
 [15] trainfile(#unused#::Transformers.Datasets.GLUE.CoLA)
    @ Transformers.Datasets.GLUE C:\Users\jackn\.julia\packages\Transformers\K1F88\src\datasets\glue\cola.jl:23
 [16] #datafile#2
    @ C:\Users\jackn\.julia\packages\Transformers\K1F88\src\datasets\dataset.jl:14 [inlined]
 [17] datafile
    @ C:\Users\jackn\.julia\packages\Transformers\K1F88\src\datasets\dataset.jl:14 [inlined]
 [18] dataset(::Type{Train}, ::Transformers.Datasets.GLUE.CoLA; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Transformers.Datasets C:\Users\jackn\.julia\packages\Transformers\K1F88\src\datasets\dataset.jl:9
 [19] dataset
    @ C:\Users\jackn\.julia\packages\Transformers\K1F88\src\datasets\dataset.jl:9 [inlined]
 [20] train!()
    @ Main C:\Users\jackn\Documents\GitHub\GitHub2\Transformers.jl\example\BERT\cola\train.jl:75
 [21] top-level scope
    @ REPL[1]:1

@chengchingwen
Copy link
Owner

@jackn11 Holy! That is a bug for downloading the datasets. I was using the local files all the time so didn't notice it.

For a quick workaround:

  1. Download the dataset from https://dl.fbaipublicfiles.com/glue/data/CoLA.zip
  2. Unzip the CoLA.zip
  3. Move the directory to ~/.julia/datadeps/
  4. Renamed the directory as GLUE-CoLA (so the full path is ~/.julia/datadeps/GLUE-CoLA/

@jackn11
Copy link
Author

jackn11 commented Jul 5, 2022

Thank you for letting me know! Will do.

Also, comments in the example code would be a huge plus to make it easier to understand.

@Broever101
Copy link

Did you ever resolve this issue? I'm having the same. Using the training example Peter linked

@chengchingwen
Copy link
Owner

The main issue for the pretrain example is that the code is really really old and I haven't find time to update that. But I didn't make it a high priority because people seldom need/want to do pretrain themselves

@jackn11
Copy link
Author

jackn11 commented Aug 5, 2022

If i recall correctly, I was able to get the CoLA example working, but it took me going through the code line by line, one piece at a time to figure out where the errors were coming from. I may have some time in the coming days to update the example @chengchingwen. Would that be helpful?

@Broever101
Copy link

The main issue for the pretrain example is that the code is really really old and I haven't find time to update that. But I didn't make it a high priority because people seldom need/want to do pretrain themselves

Yea pretrain is old but I'm having the issue with the fine-tune example
https://github.com/chengchingwen/Transformers.jl/tree/master/example/BERT

Fine-tuning is a bit more common than pre-training. The dataset API is pretty gucci but it needs some comments. Maybe I'll contribute. Got the 1.8M tweet dataset working with your training script in a few lines. But again, getting the thing running on gpu is a bit of a pain.

@chengchingwen
Copy link
Owner

Feel free to open issues/PRs! Sometimes it's hard to make stuff clear from my side because I wrote those code and they all look straightforward to me. Let me know if you need help or explanations.

@jackn11
Copy link
Author

jackn11 commented Aug 7, 2022

That sounds great! I should have a PR up soon for the COLA example!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants