Skip to content

Releases: microsoft/BlingFire

Bling Fire v0.1.8

24 Sep 19:39
d9d5cea
Compare
Choose a tag to compare

Bling Fire v0.1.7

25 May 06:07
4050cd1
Compare
Choose a tag to compare
  1. added no_dummy_prefix configuration and API to change the existing model configuration
  2. fixed the offset of the dummy prefix is now always -1, the first token may have start/end offset -1 it means dummy prefix is included
  3. change compilation options for Windows code

Bling Fire v0.1.5

12 Mar 21:14
3d90b41
Compare
Choose a tag to compare
  • Added byte BPE algorithm support
  • Added GPT2, Roberta tokenization models
  • Added hyphenation / syllabification APIs and a sample model: syllab
  • Added URL tokenization models: uri100k, uri250k, uri500k
  • Some small changes in the C# interface (it should be backwards compatible), uses Span instead of byte[] to allow on stack allocations of input and output buffers

blingfire pypi package v0.1.3

25 Jun 17:27
ccd642c
Compare
Choose a tag to compare

Four tokenization algorithms supported: patterns, word-piece, unigram lm, bpe. Added space normalization api, Added a few more popular models, added unigram lm tokenization models trained on uniformly represented ~84 languages from wikimatrix set. Bug fixes, parity fixes.