Learning the hidden structure of speech: from communicative functions to prosody

Autores

  • Gérard Bailly Université Stendhal
  • Bleike Holm Université Stendhal

DOI:

https://doi.org/10.20396/cel.v43i0.8637148

Palavras-chave:

Linguística.

Resumo

Este artigo introduz um novo método, orientado via modelamento e via interação com dados comportamentais, para gerar padrões prosódicos a partir de informação metalingüística. Referimos aqui à habilidade geral da entoação de demarcar unidades de fala e veicular informação sobre as funções proposicional e interacional dessas unidades no discurso. Nossas hipóteses fortes são que (1) essas funções são diretamente implementadas como contornos prosódicos prototípicos que são co-extensivos às unidades para as quais eles se aplicam, (2) o padrão prosódico da mensagem é obtido ao superpor e adicionar todos os contornos elementares (Aubergé & Bailly, 1995). Descrevemos aqui um esquema de análise por síntese que consiste em identificar esses contornos prototípicos e separar suas contribuições respectivas nos contornos prosódicos dos dados de treinamento. O esquema é aplicado a bases de dados designadas para evidenciar várias funções entoacionais. Resultados experimentais mostram que o modelo gera contornos prosódicos adequados com pouquíssimos movimentos prototípicos.

Downloads

Não há dados estatísticos.

Biografia do Autor

Gérard Bailly, Université Stendhal

Université Stendhal

Bleike Holm, Université Stendhal

Université Stendhal

Referências

AUBERGÉ, V. (1992). Developing a structured lexicon for synthesis of prosody. In BAILLY, G. & BENOÎT, C. (eds.). Talking Machines: Theories, Models and Designs. Elsevier B.V. pp. 307-321.

AUBERGÉ, V. & BAILLY, G. (1995). Generation of intonation: a global approach”. In Proceedings of the European Conference on Speech Communication and Technology. Madrid. pp. 2065-2068.

AUBERGÉ, V.; GRÉPILLAT, T. & RILLIARD, A. (1997). Can we perceive attitudes before the end of sentences? The gating paradigm for prosodic contours. In Proceedings of the European Conference on Speech Communication and Technology. Rhodes - Greece. pp. 871-874.

BAILLY, G. & AUBERGÉ, V. (1997). Phonetic and phonological representations for intonation. In Progress in Speech Synthesis. VAN SANTEN, J.P.H.; SPROAT, R.W.; OLIVE, J.P.; HIRSCHBERG, J. (Eds.). New York: Springer-Verlag. pp. 435-441.

BAILLY, G.; BARBE, T. & WANG, H. (1992). Automatic labelling of large prosodic databases: tools, methodology and links with a text-to-speech system. In Talking Machines: Theories, Models and Designs. G. BAILLY; C. BENOÎT (eds.). Elsevier B.V. p. 323-333.

BARBOSA, P. & BAILLY, G. (1994a). Characterisation of rhythmic patterns for text-to-speech synthesis. Speech Communication. 15: 127-137.

BARBOSA, P. (1994b). Generating pauses within the z-score model. In Proceedings of the ETRW on Speech Synthesis. New Paltz , USA. pp. 101-104.

BARBOSA, P. (1997). Generation of pauses within the z-score model. In VAN SANTEN, J.P.H.; SPROAT, R.W.; OLIVE, J.P.; HIRSCHBERG, J. (eds.). Progress in Speech Synthesis. New York: SpringerVerlag. pp. 365-381.

CAMPBELL, W. N. (1992). Multi-level timing in speech. Unpublished PhD thesis. Brighton, UK: University of Sussex.

CAMPBELL, W. N. (1993). Automatic detection of prosodic boundaries in speech. Speech Communication. 13: 343-354.

CHEN, S.-H.; HWANG, S.-H. & WANG, Y.-R. (1998). An RNN-based prosodic information synthesizer for Mandarin text-to-speech. Speech and Audio Processing. 6 (3): 226-239.

DE TOURNEMIRE, S. (1994). Recherche d’une stylisation extrème des contours de F0 en vue de leur apprentissage automatique. In Journées d’Etudes sur la Parole. Trégastel, France. pp. 75-80.

FÓNAGY, I.; BÉRARD, E. & FÓNAGY, J. (1984). Clichés mélodiques. Folia Linguistica. 17: 153-185.

FUJISAKI, H. & SUDO, H. (1971). A generative model for the prosody of connected speech in Japanese. Annual Report of Engineering Research Institute. 30: 75-80.

GÅRDING, E. (1991). Intonation parameters in production and perception. In Proceedings of the International Congress of Phonetic Sciences. Aix-en-Provence, France. pp. 300-304.

GEE, J.-P. & GROSJEAN, F. (1983). Performance structures: a psycholinguistic and linguistic appraisal. Cognitive Psychology. 15: 411-458.

GRØNNUM, N. (1992). The ground-works of Danish intonation.Copenhagen: Museum Tusculanum Press - Univ. Copenhagen.

GROSJEAN; GROSJEAN, F. & LANE (1979). The Patterns of Silence: Performance Structures in Sentence Production. Cognitive Psychology. 11: 58-81.

GUSSENHOVEN, C. (1999). Discreteness and gradience in intonational contrasts. Language and Speech. 42: 283-305.

HIRST, D.; NICOLAS, P. & ESPESSER, R. (1991). Coding the F0 of a continuous text in French: an experimental approach. In Proceedings of the International Congress of Phonetic Sciences. Aix-enProvence, France. pp. 234-237.

HIRST, D.J.; DI CRISTO, A. & ESPESSER, R. (2000). Levels of representation and levels of analysis for the description of intonation systems. In Prosody: Theory and Experiment, M. HORNE (ed.). Dordrecht, The Netherlands: Kluwer Academic Publishers. pp. 51-87.

HOLM, B. & BAILLY, G. (2000). Generating prosody by superposing multi-parametric overlapping contours. In Proceedings of the International Conference on Speech and Language Processing. Beijing, China. pp. 203-206.

HOLM, B.; BAILLY, G. & LABORDE, C. (1999). Performance structures of mathematical formulae. In Proceedings of the International Congress of Phonetic Sciences. San Francisco, USA. pp. 1297-1300.

LADD, D.R. (1986). Intonational phrasing: the case for recursive prosodic structure. Phonology Yearbook. 3: 311-340.

LADD, D.R. (1988). Declination “reset” and the hierarchical organization of utterances. Journal of the Acoustical Society of America. 84 (2): 530-544.

MARSI, E.C.; COPPEN, P.-A. J.M.; GUSSENHOVEN, C.H.M. & RIETVELD, T.C.M. (1997). Prosodic and intonational domains in speech synthesis. In VAN SANTEN, J.P.H.; SPROAT, R.W.; OLIVE, J.P.; HIRSCHBERG, J. (eds.). Progress in Speech Synthesis. New York: Springer-Verlag. pp. 477-493.

MIXDORFF, H. (2000). A Novel Approach to the Fully Automatic Extraction of Fujisaki Model Parameters. In International Conference on Acoustics, Speech and Signal Processing. Istanbul, Turkey. pp. 1281-1284.

MONNIN, P. & GROSJEAN, F. (1993). Les structures de performance en français: caractérisation et prédiction. L’Année Psychologique. 93: 9-30.

MORLEC, Y.; AUBERGÉ, V. & BAILLY, G. (1995). Evaluation of automatic generation of prosody with a superposition model. In Proceedings of the International Congress of Phonetic Sciences. Stockholm, Sweden. pp. 224-227.

MORLEC, Y.; BAILLY, G. & AUBERGÉ, V. (1995). Synthesis and evaluation of intonation with a superposition model. In Proceedings of the European Conference on Speech Communication and Technology. Madrid, Spain. pp. 2043-2046.

MORLEC, Y. (1996). Generating intonation by superposing gestures. In Proceedings of the International Conference on Speech and Language Processing. Philadelphia, USA. pp. 283-286.

MORLEC, Y. (1997). Synthesising attitudes with global rhythmic and intonation contours. In Proceedings of the European Conference on Speech Communication and Technology. Rhodes, Greece. pp. 219-222.

MORLEC, Y. (2001). Generating prosodic attitudes in French: data, model and evaluation. Speech Communication. 33 (4): 357-371.

PIERREHUMBERT, J. (1981). Synthetizing intonation. Journal of the Acoustical Society of America. 70 (4): 985-995.

PIERREHUMBERT, J. & HIRSCHBERG, J. (1990). The meaning of intonational contours in the interpretation of discourse. In P.R. COHEN; J. MORGAN & M.E. POLLAK (eds.). Intentions in Communication. Cambridge, MA: MIT Press. pp. 271-311.

SELKIRK, E.O. (1984). Phonology and Syntax. Cambridge, MA: MIT Press.

SILVERMAN, K.; BECKMAN, M.; PITRELLI, J.; OSTENDORF, M.; WIGHTMAN, C.; PRICE, P.; PIERREHUMBERT, J. & HIRSCHBERG, J. (1992). TOBI: a standard for labeling English prosody. International Conference on Speech and Language Processing, v. 2, 867-870.

T’ HART, J.; COLLIER, R. & COHEN, A. (1990). A perceptual study of intonation: an experimentalphonetic approach to speech melody. Cambridge: Cambridge University Press.

TAYLOR, P. (2000). Analysis and synthesis of intonation using the tilt model. Journal of the Acoustical Society of America. 107 (3): 1697-1714.

TRABER, C. (1992). F0 generation with a database of natural F0 patterns and with a neural network. In G. BAILLY; C. BENOÎT (eds.). Talking Machines: Theories, Models and Designs. Elsevier B.V. pp. 287-304.

Downloads

Publicado

2011-08-08

Como Citar

BAILLY, G.; HOLM, B. Learning the hidden structure of speech: from communicative functions to prosody. Cadernos de Estudos Linguísticos, Campinas, SP, v. 43, p. 37–54, 2011. DOI: 10.20396/cel.v43i0.8637148. Disponível em: https://periodicos.sbu.unicamp.br/ojs/index.php/cel/article/view/8637148. Acesso em: 1 jul. 2022.