首页|Development of a Guarani - Spanish Parallel Corpus
Development of a Guarani - Spanish Parallel Corpus
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NETL
This paper presents the development of a Guarani - Spanish parallel corpus with sentence-level alignment。 The Guarani sentences of the corpus use the Jopara Guarani dialect, the dialect of Guarani spoken in Paraguay, which is based on Guarani grammar and may include several Spanish loanwords or neologisms。 The corpus has around 14,500 sentence pairs aligned using a semi-automatic process, containing 228,000 Guarani tokens and 336,000 Spanish tokens extracted from web sources。
GuaraniSpanishparallel corpus
Luis Chiruzzo、Pedro Amarilla、Adolfo Rios、Gustavo Gimenez Lugo
展开 >
Universidad de la Republica, Montevideo, Uruguay
Universidad Nacional de Asuncion, San Lorenzo, Paraguay
Universidade Tecnologica Federal do Parana, Curitiba, PR - Brasil
International Conference on Language Resources and Evaluation
Marseille(FR)
Twelfth International Conference on Language Resources and Evaluation