Skip to content

Aligned Catalan-German and Catalan-English Europarl corpus. Catalan sentences translated from Spanish using Apertium RBMT.

Notifications You must be signed in to change notification settings

Softcatala/Europarl-catalan

Repository files navigation

Europarl-Catalan

Aligned Catalan-German and Catalan-English Europarl corpus v7. Catalan sentences were translated from Spanish using Apertium RBMT.

The Spanish original Europarl v7 corpus has been improved to fix spelling mistakes and errors which benefits the Catalan translation. The file europarl.es-en.es.xz contains the improved Spanish corpus which is the one that we used to produce the Catalan corpus.

Catalan-German alignment has been obtained using this alignment finder from de-en and ca-en.

  • Catalan-English: 1 965 735 segments.
  • Catalan-German: 1 734 644 segments.

Note: files with extension xz need to be descompressed with xz.

License

CC BY 4.0

About

Aligned Catalan-German and Catalan-English Europarl corpus. Catalan sentences translated from Spanish using Apertium RBMT.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published