Learning the hidden structure of speech: from communicative functions to prosody

Gérard Bailly; Bleike Holm

doi:10.20396/cel.v43i0.8637148

v. 43 (2002), Artigos

v. 43 (2002)

Learning the hidden structure of speech: from communicative functions to prosody

Artigos

https://doi.org/10.20396/cel.v43i0.8637148

Publicado 2011-08-08

Gérard Bailly⁺⁻
Bleike Holm⁺⁻

Gérard Bailly

Université Stendhal

Bleike Holm

Université Stendhal

PDF

Palavras-chave

Linguística.

Como Citar

BAILLY, Gérard; HOLM, Bleike. Learning the hidden structure of speech: from communicative functions to prosody. Cadernos de Estudos Linguísticos, Campinas, SP, v. 43, p. 37–54, 2011. DOI: 10.20396/cel.v43i0.8637148. Disponível em: https://periodicos.sbu.unicamp.br/ojs/index.php/cel/article/view/8637148. Acesso em: 17 jul. 2024.

Resumo

Este artigo introduz um novo método, orientado via modelamento e via interação com dados comportamentais, para gerar padrões prosódicos a partir de informação metalingüística. Referimos aqui à habilidade geral da entoação de demarcar unidades de fala e veicular informação sobre as funções proposicional e interacional dessas unidades no discurso. Nossas hipóteses fortes são que (1) essas funções são diretamente implementadas como contornos prosódicos prototípicos que são co-extensivos às unidades para as quais eles se aplicam, (2) o padrão prosódico da mensagem é obtido ao superpor e adicionar todos os contornos elementares (Aubergé & Bailly, 1995). Descrevemos aqui um esquema de análise por síntese que consiste em identificar esses contornos prototípicos e separar suas contribuições respectivas nos contornos prosódicos dos dados de treinamento. O esquema é aplicado a bases de dados designadas para evidenciar várias funções entoacionais. Resultados experimentais mostram que o modelo gera contornos prosódicos adequados com pouquíssimos movimentos prototípicos.

https://doi.org/10.20396/cel.v43i0.8637148

PDF

Referências

AUBERGÉ, V. (1992). Developing a structured lexicon for synthesis of prosody. In BAILLY, G. & BENOÎT, C. (eds.). Talking Machines: Theories, Models and Designs. Elsevier B.V. pp. 307-321.

AUBERGÉ, V. & BAILLY, G. (1995). Generation of intonation: a global approach”. In Proceedings of the European Conference on Speech Communication and Technology. Madrid. pp. 2065-2068.

AUBERGÉ, V.; GRÉPILLAT, T. & RILLIARD, A. (1997). Can we perceive attitudes before the end of sentences? The gating paradigm for prosodic contours. In Proceedings of the European Conference on Speech Communication and Technology. Rhodes - Greece. pp. 871-874.

BAILLY, G. & AUBERGÉ, V. (1997). Phonetic and phonological representations for intonation. In Progress in Speech Synthesis. VAN SANTEN, J.P.H.; SPROAT, R.W.; OLIVE, J.P.; HIRSCHBERG, J. (Eds.). New York: Springer-Verlag. pp. 435-441.

BAILLY, G.; BARBE, T. & WANG, H. (1992). Automatic labelling of large prosodic databases: tools, methodology and links with a text-to-speech system. In Talking Machines: Theories, Models and Designs. G. BAILLY; C. BENOÎT (eds.). Elsevier B.V. p. 323-333.

BARBOSA, P. & BAILLY, G. (1994a). Characterisation of rhythmic patterns for text-to-speech synthesis. Speech Communication. 15: 127-137.

BARBOSA, P. (1994b). Generating pauses within the z-score model. In Proceedings of the ETRW on Speech Synthesis. New Paltz , USA. pp. 101-104.

BARBOSA, P. (1997). Generation of pauses within the z-score model. In VAN SANTEN, J.P.H.; SPROAT, R.W.; OLIVE, J.P.; HIRSCHBERG, J. (eds.). Progress in Speech Synthesis. New York: SpringerVerlag. pp. 365-381.

CAMPBELL, W. N. (1992). Multi-level timing in speech. Unpublished PhD thesis. Brighton, UK: University of Sussex.

CAMPBELL, W. N. (1993). Automatic detection of prosodic boundaries in speech. Speech Communication. 13: 343-354.

CHEN, S.-H.; HWANG, S.-H. & WANG, Y.-R. (1998). An RNN-based prosodic information synthesizer for Mandarin text-to-speech. Speech and Audio Processing. 6 (3): 226-239.

DE TOURNEMIRE, S. (1994). Recherche d’une stylisation extrème des contours de F0 en vue de leur apprentissage automatique. In Journées d’Etudes sur la Parole. Trégastel, France. pp. 75-80.

FÓNAGY, I.; BÉRARD, E. & FÓNAGY, J. (1984). Clichés mélodiques. Folia Linguistica. 17: 153-185.

FUJISAKI, H. & SUDO, H. (1971). A generative model for the prosody of connected speech in Japanese. Annual Report of Engineering Research Institute. 30: 75-80.

GÅRDING, E. (1991). Intonation parameters in production and perception. In Proceedings of the International Congress of Phonetic Sciences. Aix-en-Provence, France. pp. 300-304.

GEE, J.-P. & GROSJEAN, F. (1983). Performance structures: a psycholinguistic and linguistic appraisal. Cognitive Psychology. 15: 411-458.

GRØNNUM, N. (1992). The ground-works of Danish intonation.Copenhagen: Museum Tusculanum Press - Univ. Copenhagen.

GROSJEAN; GROSJEAN, F. & LANE (1979). The Patterns of Silence: Performance Structures in Sentence Production. Cognitive Psychology. 11: 58-81.

GUSSENHOVEN, C. (1999). Discreteness and gradience in intonational contrasts. Language and Speech. 42: 283-305.

HIRST, D.; NICOLAS, P. & ESPESSER, R. (1991). Coding the F0 of a continuous text in French: an experimental approach. In Proceedings of the International Congress of Phonetic Sciences. Aix-enProvence, France. pp. 234-237.

HIRST, D.J.; DI CRISTO, A. & ESPESSER, R. (2000). Levels of representation and levels of analysis for the description of intonation systems. In Prosody: Theory and Experiment, M. HORNE (ed.). Dordrecht, The Netherlands: Kluwer Academic Publishers. pp. 51-87.

HOLM, B. & BAILLY, G. (2000). Generating prosody by superposing multi-parametric overlapping contours. In Proceedings of the International Conference on Speech and Language Processing. Beijing, China. pp. 203-206.

HOLM, B.; BAILLY, G. & LABORDE, C. (1999). Performance structures of mathematical formulae. In Proceedings of the International Congress of Phonetic Sciences. San Francisco, USA. pp. 1297-1300.

LADD, D.R. (1986). Intonational phrasing: the case for recursive prosodic structure. Phonology Yearbook. 3: 311-340.

LADD, D.R. (1988). Declination “reset” and the hierarchical organization of utterances. Journal of the Acoustical Society of America. 84 (2): 530-544.

MARSI, E.C.; COPPEN, P.-A. J.M.; GUSSENHOVEN, C.H.M. & RIETVELD, T.C.M. (1997). Prosodic and intonational domains in speech synthesis. In VAN SANTEN, J.P.H.; SPROAT, R.W.; OLIVE, J.P.; HIRSCHBERG, J. (eds.). Progress in Speech Synthesis. New York: Springer-Verlag. pp. 477-493.

MIXDORFF, H. (2000). A Novel Approach to the Fully Automatic Extraction of Fujisaki Model Parameters. In International Conference on Acoustics, Speech and Signal Processing. Istanbul, Turkey. pp. 1281-1284.

MONNIN, P. & GROSJEAN, F. (1993). Les structures de performance en français: caractérisation et prédiction. L’Année Psychologique. 93: 9-30.

MORLEC, Y.; AUBERGÉ, V. & BAILLY, G. (1995). Evaluation of automatic generation of prosody with a superposition model. In Proceedings of the International Congress of Phonetic Sciences. Stockholm, Sweden. pp. 224-227.

MORLEC, Y.; BAILLY, G. & AUBERGÉ, V. (1995). Synthesis and evaluation of intonation with a superposition model. In Proceedings of the European Conference on Speech Communication and Technology. Madrid, Spain. pp. 2043-2046.

MORLEC, Y. (1996). Generating intonation by superposing gestures. In Proceedings of the International Conference on Speech and Language Processing. Philadelphia, USA. pp. 283-286.

MORLEC, Y. (1997). Synthesising attitudes with global rhythmic and intonation contours. In Proceedings of the European Conference on Speech Communication and Technology. Rhodes, Greece. pp. 219-222.

MORLEC, Y. (2001). Generating prosodic attitudes in French: data, model and evaluation. Speech Communication. 33 (4): 357-371.

PIERREHUMBERT, J. (1981). Synthetizing intonation. Journal of the Acoustical Society of America. 70 (4): 985-995.

PIERREHUMBERT, J. & HIRSCHBERG, J. (1990). The meaning of intonational contours in the interpretation of discourse. In P.R. COHEN; J. MORGAN & M.E. POLLAK (eds.). Intentions in Communication. Cambridge, MA: MIT Press. pp. 271-311.

SELKIRK, E.O. (1984). Phonology and Syntax. Cambridge, MA: MIT Press.

SILVERMAN, K.; BECKMAN, M.; PITRELLI, J.; OSTENDORF, M.; WIGHTMAN, C.; PRICE, P.; PIERREHUMBERT, J. & HIRSCHBERG, J. (1992). TOBI: a standard for labeling English prosody. International Conference on Speech and Language Processing, v. 2, 867-870.

T’ HART, J.; COLLIER, R. & COHEN, A. (1990). A perceptual study of intonation: an experimentalphonetic approach to speech melody. Cambridge: Cambridge University Press.

TAYLOR, P. (2000). Analysis and synthesis of intonation using the tilt model. Journal of the Acoustical Society of America. 107 (3): 1697-1714.

TRABER, C. (1992). F0 generation with a database of natural F0 patterns and with a neural network. In G. BAILLY; C. BENOÎT (eds.). Talking Machines: Theories, Models and Designs. Elsevier B.V. pp. 287-304.

O periódico Cadernos de Estudos Linguísticos utiliza a licença do Creative Commons (CC), preservando assim, a integridade dos artigos em ambiente de acesso aberto.

Downloads

Não há dados estatísticos.

Learning the hidden structure of speech: from communicative functions to prosody

Palavras-chave

Como Citar

Baixar Citação

Resumo

Referências

Downloads