Skip to content

speechlab-ntu/SEAME-dev-set

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SEAME-dev-set

For performance evaluation, we extract two subsets of the SEAME data as test sets: one is dominated by Mandarin speech (denoted as dev_man) while the other (denoted as dev_sge) is dominated by Singapore English. Each test set contains 10 speakers with balanced genders.

Speakers hours
train 134 101.13
dev_man 10 7.49
dev_sge 10 3.93

We only shared the train wav file list which you can see in LDC2015S04. Please contact me if you have any questions ([email protected]).

References

[1] Dau-Cheng Lyu, Tien Ping Tan, Eng siong Chng, and Hai zhou Li,“SEAME:a mandarin-english code-switching speech corpus in south-east asia.,” in INTERSPEECH, 2010, vol. 10, pp. 1986–1989.

[2] Zhiping Zeng, Yerbolat Khassanov, Van Tung Pham, Haihua Xu, Eng Siong Chng, and Haizhou Li, “On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition,” arXiv preprint arXiv:1811.00241, 2018.

About

SEAME corpus two develop set

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published