Skip to content

dellison/WikiText.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WikiText.jl

Build Status codecov.io

About

WikiText.jl provides an interface to the WikiText Long Term Dependency Language Modeling dataset.

Usage

WikiText exports the following 4 types, corresponding to the 4 available datasets:

  • WikiText2
  • WikiText103,
  • WikiText2Raw
  • WikiText103Raw

Wikitext also exports following 3 functions:

  • trainfile
  • validationfile
  • testfile

Downloading and unzipping the datasets will happen automatically (with your approval) when you access them for the first time, courtesy of DataDeps.jl.

julia> ]add WikiText
julia> using WikiText
julia> corpus = WikiText2v1()
julia> trainfile(corpus)
"/path/to/wiki.train.tokens"
julia> validationfile(corpus)
"/path/to/wiki.valid.tokens"