Parser works with huge amount of data by Tatoeba project.
And it generates some JSON files with data.
Here are file names with data examples:
'data.js'
(rus-eng sentences)
{
"id": 3,
"textEng": "You can't achieve the impossible without attempting the absurd.",
"textRus": "Нельзя достичь невозможного, не делая безумных попыток."
}
'data-eng.json'
{
"id": "1288",
"text": "I just don't know what to say.",
"sentenceId": "8003976",
"hasAudio": true
}
'data-rus.json'
{
"id": "243",
"text": "Один раз в жизни я делаю хорошее дело... И оно бесполезно.",
"sentenceId": "5507120",
"hasAudio": true
}
'data-eng-with-audio.json'
{
"id": "1277",
"text": "I have to go to sleep.",
"sentenceId": "7960374",
"hasAudio": true
}
'data-rus-with-audio.json'
{
"id": "5430",
"text": "Нелегко решать, что правильно, а что нет, но приходится это делать.",
"sentenceId": "1596576",
"hasAudio": true
}
and also minified files.
- download the data from the following links:
sentences
links
sentences with audio - unzip downloaded files to get following files:
links.csv, sentences.csv, sentences_with_audio.csv
. - create a new directory
'data-input'
in the project's root folder. - put
links.csv, sentences.csv, sentences_with_audio.csv
to'data-input'
directory: - run
'npm start'
- result will be placed to
'data-output'
directory.