Linguistic compositions highly volatile in Portuguese

Jesús Enrique García; Ramin Gholizadeh; Verónica Andrea González-López

doi:10.20396/cel.v59i3.8651002

v. 59 n. 3 (2017), Artigos

v. 59 n. 3 (2017)

Linguistic compositions highly volatile in Portuguese

Artigos

https://doi.org/10.20396/cel.v59i3.8651002

Publicado 2017-12-04

Jesús Enrique García⁺⁻
Ramin Gholizadeh⁺⁻
Verónica Andrea González-López⁺⁻

Jesús Enrique García

Universidade Estadual de Campinas

Ramin Gholizadeh

Universidade Estadual de Campinas

Verónica Andrea González-López

Universidade Estadual de Campinas

PDF

Palavras-chave

And phrases. Bayesian information criterion. Partition Markov models. Proximity between N-grams.

Como Citar

GARCÍA, Jesús Enrique; GHOLIZADEH, Ramin; GONZÁLEZ-LÓPEZ, Verónica Andrea. Linguistic compositions highly volatile in Portuguese. Cadernos de Estudos Linguísticos, Campinas, SP, v. 59, n. 3, p. 617–630, 2017. DOI: 10.20396/cel.v59i3.8651002. Disponível em: https://periodicos.sbu.unicamp.br/ojs/index.php/cel/article/view/8651002. Acesso em: 30 jun. 2024.

Resumo

In this paper we use a distance d between sequences of N-grams to identify N-grams that show a different performance when comparing two sequences of N-grams. With this tool, we inspect written texts of European Portuguese dated between 16th century and 19th century. We identify the most voluble N-grams throughout the period and we also identify N-grams that should be considered when studying the linguistic changes from Classical Portuguese to Modern Portuguese. We find that 2-grams composed by unstressed monosyllables followed by paroxytone words (and viceversa) change markedly, from one text to the next, during the whole period. Stressed monosyllabic words (SMW) reveal discrepancies between written texts of the 16th century when compared with texts from the beginning of the 17th century. 2-grams of (i) SMW followed by paroxytone or oxytone word and (ii) paroxytone dissyllabic word or oxytone word followed by a SMW are some of them.

https://doi.org/10.20396/cel.v59i3.8651002

PDF

Referências

S. Frota, C. Galves, M. Vigario, V. A. González-López and B. Abaurre, The phonology of rhythm from Classical to Modern Portuguese, Journal of Historical Linguistics (2012) 2.2 173-207.

C. Galves and P. Faria, Tycho Brahe Parsed Corpus of Historical Portuguese. http://www.tycho.iel.unicamp.br/tycho/corpus/en/index.html (2010).

A. Galves, C. Galves, J. García, N. L. Garcia and F. Leonardi, Context tree selection and linguistic rhythm retrieval from written texts, The Annals of Applied Statistics (2012) 6(1) 186-209.

Jesus E. García and V. A. González-López, Detecting regime changes in Markov models, New Trends in Stochastic Modeling and Data Analysis (2015) (in chapter 2, p. 103).

Jesus E. García and V. A. González-López, Optimal Partition of Markov Models and Automatic Classication of Languages, Stochastic and Data Analysis Methods and Applications in Statistics and Demography (2016) (in chapter 5, p. 207).

Jesus E. García and V. A. González-López, Consistent Estimation of Partition Markov Models, Entropy (2017) 19 160.

Jesus E. García, V. A. González-López and F. H. Kubo de Andrade, Dissimilarity between Markovian Processes Applied to Industrial Processes, AIP Conference Proceedings (2017)1863 220002.

C.D. Manning and H. Schütze, Foundations of statistical natural language processing, Vol. 999. Cambridge: MIT press, (1999).

J. Mehler and M. Nespor, Linguistic rhythm and the acquisition of language, Vol. 3, pp. 213-222. Oxford: Oxford University Press, (2004).

G. Schwarz, Estimating the dimension of a model, The annals of statistics, (1978) 6(2) 461-464.

Jesus E. García: Department of Statistics, University of Campinas, Campinas, SP, CEP 13083-859, Brazil - E-mail address: jg@ime.unicamp.br

R. Gholizadeh: University of Campinas, Campinas, SP, CEP: 13083-859, Brazil - E-mail address: 1ramin.gholizadh@gmail.com

V. A. González-López: Department of Statistics, University of Campinas, Campinas, SP, CEP: 13083-859, Brazil - E-mail address: veronica@ime.unicamp.br

O periódico Cadernos de Estudos Linguísticos utiliza a licença do Creative Commons (CC), preservando assim, a integridade dos artigos em ambiente de acesso aberto.

Downloads

Não há dados estatísticos.

Linguistic compositions highly volatile in Portuguese

Palavras-chave

Como Citar

Baixar Citação

Resumo

Referências

Downloads