Development of a Guarani - Spanish Parallel Corpus

扫码查看

原文链接

NETL

外文摘要：This paper presents the development of a Guarani - Spanish parallel corpus with sentence-level alignment。 The Guarani sentences of the corpus use the Jopara Guarani dialect， the dialect of Guarani spoken in Paraguay， which is based on Guarani grammar and may include several Spanish loanwords or neologisms。 The corpus has around 14，500 sentence pairs aligned using a semi-automatic process， containing 228，000 Guarani tokens and 336，000 Spanish tokens extracted from web sources。

外文关键词：

GuaraniSpanishparallel corpus

作者：

Luis Chiruzzo、Pedro Amarilla、Adolfo Rios、Gustavo Gimenez Lugo

展开 >

作者单位：

Universidad de la Republica, Montevideo, Uruguay

Universidad Nacional de Asuncion, San Lorenzo, Paraguay

Universidade Tecnologica Federal do Parana, Curitiba, PR - Brasil

会议名称：

International Conference on Language Resources and Evaluation

会议地点：

Marseille(FR)

会议母体文献：

Twelfth International Conference on Language Resources and Evaluation

页码：

2629-2633

出版时间：

2020