Banner Portal
Annotating a polysynthetic language: From Portuguese to Kadiwéu
PDF

Palavras-chave

Kadiwéu. Polyssynthesis. Tycho Brahe Corpus. Morphological annotation.

Como Citar

GALVES, C.; SANDALO, F.; SENA, T. A. de; VERONESI, L. Annotating a polysynthetic language: From Portuguese to Kadiwéu. Cadernos de Estudos Linguísticos, Campinas, SP, v. 59, n. 3, p. 631–648, 2017. DOI: 10.20396/cel.v59i3.8651003. Disponível em: https://periodicos.sbu.unicamp.br/ojs/index.php/cel/article/view/8651003. Acesso em: 21 fev. 2024.

Resumo

We propose for Kadiwéu, a polysynthetic language of Brazil, an extension of the POS annotation of the Tycho Brahe Annotated Corpus of Historical Portuguese (www.tycho.iel.unicamp.br/~tycho/corpus) – henceforth TBC, which consists in tagging both words and morphemes, yielding a two-level annotation. The tagging of words is necessary to generate the syntactic parsing that is missing from the current corpuses of Brazilian native languages. The morphological tagging is also crucial for polysynthetic languages since it allows searching for grammatical properties encoded by the morphemes. This is a pioneer proposal since it is the first time an American Indian language will be part of a Corpus allowing grammatical searches that include morphological and syntactic information.
https://doi.org/10.20396/cel.v59i3.8651003
PDF

Referências

Aikhenwald, A. (2000). Classifiers: A Typology of Noun Categorization Devices. Oxford: Oxford University Press.

Britto, H., Finger, M., Galves, C. (2002). Computational and linguistic aspects of the construction of theTycho Brahe Parsed Corpus of Historical Portuguese. Romance Corpus Linguistics - Corpora and Spoken language. Tubingen: Narr.

Finger, M. (2000). Técnicas de Otimização da Precisão Empregadas no Etiquetador Tycho Brahe. Anais do V Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada (PROPOR2000).

Sandalo, F. (1995). A Grammar of Kadiwéu, Unpublished PhD dissertation. University of Pittsburgh. Sandalo, F. (2015)

O periódico Cadernos de Estudos Linguísticos utiliza a licença do Creative Commons (CC), preservando assim, a integridade dos artigos em ambiente de acesso aberto.

Downloads

Não há dados estatísticos.