Skip to content

Pretraining data

zhezhaoa edited this page Oct 14, 2022 · 7 revisions

CLUECorpusSmall

CLUECorpusSmall consists of news, web, wiki, and comments corpus. The original data and detailed description can be found here.

Corpus Link
CLUECorpusSmall https://share.weiyun.com/sC6PMhxx
CLUECorpusSmall (BERT format) https://share.weiyun.com/9SPPGUOK

News Commentary v13 (ZH-EN)

News Commentary v13 consists of parallel data and can be downloaded here.

Corpus Link
news-Commentary-v13-en-zh https://share.weiyun.com/PLMxw6ae
news-Commentary-v13-zh-en https://share.weiyun.com/5rMwRhDi
Clone this wiki locally