annotations_creators | language | language_creators | license | multilinguality | pretty_name | size_categories | source_datasets | tags | task_categories | task_ids | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
Viquipèdia |
|
|
|
|
|
- Link: https://www.kaggle.com/datasets/jarfo1/viquipdia
- Main author: José Andrés Rodriguez Fonollosa
The Wikipedia dataset is a collection of scraped Wikipedia pages. The dataset is defined in catalan language, thus the model will be trained to recognize input exclusively in catalan.
Text generation
Catalan
{
'ca-2': [
'ca.wiki.test.tokens',
'ca.wiki.train.tokens',
'ca.wiki.valid.tokens']
'ca-100': [
'ca.wiki.test.tokens',
'ca.wiki.train.tokens',
'ca.wiki.valid.tokens']
'ca-all': [
'ca.wiki.test.tokens',
'ca.wiki.train.tokens',
'ca.wiki.valid.tokens']
}
Plain text
train | validation | test | |
---|---|---|---|
ca-2 | 10.64MB | 1.07MB | 1.06MB |
ca-100 | 528.96MB | 1.07MB | 1.06MB |
ca-all | 1.32GB | 1.07MB | 1.06MB |