Skip to content

ericperfect/libtorch_tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Huggingface Transformers Tokenizer in C++

A tokenizer is in charge of preparing the inputs for a model.

The tokenizer can tokenize Chinese-English bilingual in Linux.

This project mainly solves some Chinese character encoding problems.

Requirements

  • Boost

C++ unicode support

About

BERT Tokenizer in C++

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published