Abstract
We propose for Kadiwéu, a polysynthetic language of Brazil, an extension of the POS annotation of the Tycho Brahe Annotated Corpus of Historical Portuguese (www.tycho.iel.unicamp.br/~tycho/corpus) – henceforth TBC, which consists in tagging both words and morphemes, yielding a two-level annotation. The tagging of words is necessary to generate the syntactic parsing that is missing from the current corpuses of Brazilian native languages. The morphological tagging is also crucial for polysynthetic languages since it allows searching for grammatical properties encoded by the morphemes. This is a pioneer proposal since it is the first time an American Indian language will be part of a Corpus allowing grammatical searches that include morphological and syntactic information.References
Aikhenwald, A. (2000). Classifiers: A Typology of Noun Categorization Devices. Oxford: Oxford University Press.
Britto, H., Finger, M., Galves, C. (2002). Computational and linguistic aspects of the construction of theTycho Brahe Parsed Corpus of Historical Portuguese. Romance Corpus Linguistics - Corpora and Spoken language. Tubingen: Narr.
Finger, M. (2000). Técnicas de Otimização da Precisão Empregadas no Etiquetador Tycho Brahe. Anais do V Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada (PROPOR2000).
Sandalo, F. (1995). A Grammar of Kadiwéu, Unpublished PhD dissertation. University of Pittsburgh. Sandalo, F. (2015)
The journal CADERNOS DE ESTUDOS LINGUÍSTICOS is granted all the copyright related to the published works. The originals will not be returned. By virtue of being part of this public access journal, the articles are free to use, with their own attributions, in educational and non-commercial applications