Linguistic compositions highly volatile in Portuguese
Palavras-chave:And phrases. Bayesian information criterion. Partition Markov models. Proximity between N-grams.
ResumoIn this paper we use a distance d between sequences of N-grams to identify N-grams that show a different performance when comparing two sequences of N-grams. With this tool, we inspect written texts of European Portuguese dated between 16th century and 19th century. We identify the most voluble N-grams throughout the period and we also identify N-grams that should be considered when studying the linguistic changes from Classical Portuguese to Modern Portuguese. We find that 2-grams composed by unstressed monosyllables followed by paroxytone words (and viceversa) change markedly, from one text to the next, during the whole period. Stressed monosyllabic words (SMW) reveal discrepancies between written texts of the 16th century when compared with texts from the beginning of the 17th century. 2-grams of (i) SMW followed by paroxytone or oxytone word and (ii) paroxytone dissyllabic word or oxytone word followed by a SMW are some of them.
S. Frota, C. Galves, M. Vigario, V. A. González-López and B. Abaurre, The phonology of rhythm from Classical to Modern Portuguese, Journal of Historical Linguistics (2012) 2.2 173-207.
C. Galves and P. Faria, Tycho Brahe Parsed Corpus of Historical Portuguese. http://www.tycho.iel.unicamp.br/tycho/corpus/en/index.html (2010).
A. Galves, C. Galves, J. García, N. L. Garcia and F. Leonardi, Context tree selection and linguistic rhythm retrieval from written texts, The Annals of Applied Statistics (2012) 6(1) 186-209.
Jesus E. García and V. A. González-López, Detecting regime changes in Markov models, New Trends in Stochastic Modeling and Data Analysis (2015) (in chapter 2, p. 103).
Jesus E. García and V. A. González-López, Optimal Partition of Markov Models and Automatic Classication of Languages, Stochastic and Data Analysis Methods and Applications in Statistics and Demography (2016) (in chapter 5, p. 207).
Jesus E. García and V. A. González-López, Consistent Estimation of Partition Markov Models, Entropy (2017) 19 160.
Jesus E. García, V. A. González-López and F. H. Kubo de Andrade, Dissimilarity between Markovian Processes Applied to Industrial Processes, AIP Conference Proceedings (2017)1863 220002.
C.D. Manning and H. Schütze, Foundations of statistical natural language processing, Vol. 999. Cambridge: MIT press, (1999).
J. Mehler and M. Nespor, Linguistic rhythm and the acquisition of language, Vol. 3, pp. 213-222. Oxford: Oxford University Press, (2004).
G. Schwarz, Estimating the dimension of a model, The annals of statistics, (1978) 6(2) 461-464.
Jesus E. García: Department of Statistics, University of Campinas, Campinas, SP, CEP 13083-859, Brazil - E-mail address: firstname.lastname@example.org
R. Gholizadeh: University of Campinas, Campinas, SP, CEP: 13083-859, Brazil - E-mail address: email@example.com
V. A. González-López: Department of Statistics, University of Campinas, Campinas, SP, CEP: 13083-859, Brazil - E-mail address: firstname.lastname@example.org
O periódico Cadernos de Estudos Linguísticos utiliza a licença do Creative Commons (CC), preservando assim, a integridade dos artigos em ambiente de acesso aberto.