Skip to content

Latest commit

 

History

History
73 lines (58 loc) · 2.54 KB

README.md

File metadata and controls

73 lines (58 loc) · 2.54 KB

Description

This repository collects open source parallel aligned corpuses Catalan to several languages.

We use these corpuses to train the Softcatalà neural translation system:

Note: files with extension xz need to be descompressed with xz.

Sources of the corpus used

We strongly recommend the following sources of aligned Catalan parallel corpuses:

On top of these previously available corpus, we have created the following corpus:

Do you want to help?

See here (In Catalan)

Contact

Contact Jordi Mas [email protected]

Metadescription

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value
name Open source aligned text corpus English, German, Spanish, etc to/from Catalan.
description Open source aligned text corpus for building NLP applications (e.g. machine translation). Already existing corpus have been clean up and new corpus have been introduced: Europarl Catalan, Tilde Catalan and open source translation memories.
sameAs https://github.com/Softcatala/parallel-catalan-corpus/
url
creator Softcatalà