Banner Portal
How well can ASR technology understand foreign-accented speech?
PDF

Keywords

Intelligibility
L2 Pronunciation Development
Automatic speech recognition
Autonomous learning

How to Cite

SOUZA, Hanna Kivisto de; GOTTARDI, William. How well can ASR technology understand foreign-accented speech?. Trabalhos em Linguística Aplicada, Campinas, SP, v. 61, n. 3, p. 764–781, 2022. Disponível em: https://periodicos.sbu.unicamp.br/ojs/index.php/tla/article/view/8668782. Acesso em: 17 aug. 2024.

Abstract

Following the Covid-19 pandemic, digital technology is more present in classrooms than ever. Automatic Speech Recognition (ASR) offers interesting possibilities for language learners to produce more output in a foreign language (FL). ASR is especially suited for autonomous pronunciation learning when used as a dictation tool that transcribes the learner's speech (McCROCKLIN, 2016). However, ASR tools are trained with monolingual native speakers in mind, not reflecting the global reality of English speakers. Consequently, the present study examined how well two ASR-based dictation tools understand foreign-accented speech, and which FL speech features cause intelligibility breakdowns. English speech samples of 15 Brazilian Portuguese and 15 Spanish speakers were obtained from an online database (WEINBERGER, 2015) and submitted to two ASR dictation tools: Microsoft Word and VoiceNotebook. The resulting transcriptions were manually inspected, coded and categorized. The results show that overall intelligibility was high for both tools. However, many features of normal FL speech, such as vowel and consonant substitution, caused the ASR dictation tools to misinterpret the message leading to communication breakdowns. The results are discussed from a pedagogical viewpoint.

PDF

References

ASHWELL, T.; ELAM, J. R. (2017). How accurately can the google web speech API recognize and transcribe Japanese L2 english learners’ oral production? JALT CALL Journal, v. 13, n. 1, p. 59-76.

CARLET, A.; KIVISTÖ DE SOUZA, H. ( 2018). Improving L2 pronunciation inside and outside the classroom: Perception, production and autonomous learning of L2 vowels. Ilha do Desterro, v.71, n.3, p.99-123.

BASHORI, M. et al. (2020). Web-based language learning and speaking anxiety. Computer Assisted Language Learning, v. 0, n. 0, p. 1-32.

BASHORI, M. et al. (2021). Effects of ASR-based websites on EFL learners’ vocabulary, speaking anxiety, and language enjoyment. System, v. 99, n. April, p. 102496.

CHEN, H. H. J. (2011). Developing and evaluating an oral skills training website supported by automatic speech recognition technology. ReCALL, v. 23, n. 1, p. 59-78.

CUCCHIARINI, C.; NERI, A.; STRIK, H. (2009). Oral proficiency training in Dutch L2: The contribution of ASR-based corrective feedback. Speech Communication, v. 51, n. 10, p. 853-863.

CUCCHIARINI, C.; STRIK, H. (2018). Automatic Speech Recognition for second language pronunciation training. In: The Routledge handbook of contemporary English pronunciation. Routledge. p. 556-569.

DERWING, T. (2010). Utopian goals for pronunciation teaching. (J. Levis, K. LeVelle, Eds.) In: 1st Pronunciation in Second Language Learning and Teaching Conference. Proceedings... Ames, IA: Iowa State University.

DERWING, T.; MUNRO, M. (1997). Accent, intelligibility, and comprehensibility: Evidence from Four L1s. Studies in Second Language Acquisition, v. 19, n. 1, p. 1-16.

DIZON, G.; TANG, D. (2020). Intelligent personal assistants for autonomous second language learning: An investigation of Alexa. JALT CALL Journal, v. 16, n. 2, p. 107-120.

GASS, S.; VARONIS, E. M. (1984). The effect of familiarity on the comprehensibility of nonnative speech. Language learning, v. 34, n. 1, p. 65-87.

GOLONKA, E. M. et al. (2014). Technologies for foreign language learning: A review of technology types and their effectiveness. Computer Assisted Language Learning, v. 27, n. 1, p. 70-105.

HENRICHSEN, L. E. (2020). An Illustrated Taxonomy of Online CAPT Resources. RELC Journal, 52(1), 179-188.

INCEOGLU, S.; LIM, H.; CHEN, W. H. (2020). Asr for EFL pronunciation practice: Segmental development and learners’ beliefs. Journal of Asia TEFL, v. 17, n. 3, p. 824-840.

JENKINS, J. (2002). A sociolinguistically based, empirically researched pronunciation syllabus for English as an international language. Applied linguistics, v. 23, n. 1, p. 83-103.

JENKINS, J.; COGO, A.; DEWEY, M. (2011). Review of developments in research into English as a lingua franca. Language teaching, 44(3), 281-315.

JOHNSON, E.; JUSCZYK, P. (2001). Word segmentation by 8-month-olds: When speech cues count more than statistics. Journal of Memory and Language, v. 44, p. 548-567.

JURAFSKY, D.; MARTIN, J. H. (2021). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (3rd ed., Unpublished Manuscript). Available at: <https://web.stanford.edu/~jurafsky/slp3/ed3book_sep212021.pdf>. Accessed: Nov, 1st 2021.

KENNEDY, S.; TROFIMOVICH, P. (2008). Intelligibility, comprehensibility, and accentedness of L2 speech: The role of listener experience and semantic context. Canadian Modern Language Review, v. 64, n. 3, p. 459-489.

KIM, I. S. (2006). Automatic speech recognition: Reliability and pedagogical implications for teaching pronunciation. Educational Technology and Society, v. 9, n. 1, p. 322-334.

KIVISTÖ DE SOUZA, H.; MORA, J. C. Speech rate effects on L2 vowel production and perception. In: CELSUL, 2012, Cascavel, Paraná. Anais do X Encontro do CELSUL-Círculo de Estudos Linguísticos do Sul. Cascavel, 2012.

KNILL, K. M. et al. (2018). Impact of ASR performance on free speaking language assessment. In: Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings… v. 2018- Septe, p. 1641-1645.

LEVIS, J. M. (2005). Changing contexts and shifting paradigms in pronunciation teaching. Tesol Quarterly, v. 39, n. 3, p. 369-377.

LEVIS, J.; SUVOROV, R. (2013). Automatic Speech Recognition. In: CHAPELLE, C. A. (Ed.). The encyclopedia of applied linguistics. New York: Wiley-Blackwell. p. 316-323.

LIAKIN, D.; CARDOSO, W.; LIAKINA, N. (2015). Learning L2 pronunciation with a mobile speech recognizer: French/y/. CALICO Journal, v. 32, n. 1, p. 1-25.

LIAKIN, D.; CARDOSO, W.; LIAKINA, N. (2017). Mobilizing Instruction in a Second-Language Context: Learners’ Perceptions of Two Speech Technologies. Languages, v. 2, n. 3, p. 11.

LYSTER, R.; SAITO, K. (2010). Oral feedback in classroom SLA: A Meta-Analysis. Studies in Second Language Acquisition, v. 32, n. 2, p. 265-302.

MCCROCKLIN, S. (2019a). Dictation programs for second language pronunciation learning: Perceptions of the transcript , strategy use and improvement. v. 7, n. 2, p. 137-157.

MCCROCKLIN, S.; EDALATISHAMS, I. (2020). Revisiting Popular Speech Recognition Software for ESL Speech. TESOL Quarterly, v. 54, n. 4, p. 1086-1097.

MCCROCKLIN, S. M. (2014). Dictation programs for pronunciation learner empowerment. In: 5th pronunciation in second language learning and teaching conference. Proceedings… n. September, p. 30-39.

MCCROCKLIN, S. M. (2016). Pronunciation learner autonomy: The potential of Automatic Speech Recognition. System, v. 57, n. April 2016, p. 25-42.

MICROSOFT. (2021). Dictate Your Documents in Word. Available at: <https://support.microsoft.com/en-us/office/dictate-your-documents-in-word-3876e05f-3fcc-418f-b8ab-db7ce0d11d3c#Tab=Web>. Accessed: Nov, 1st 2021.

MORA, J. C. (2005). Lexical knowledge effects on the discrimination of non-native phonemic contrasts in words and nonwords by Spanish/Catalan bilingual learners of English. In: ISCA Workshop on Plasticity in Speech Perception.

MROZ, A. (2018). Seeing how people hear you: French learners experiencing intelligibility through automatic speech recognition. Foreign Language Annals, v. 51, n. 3, p. 617-637.

MUNRO, M. (1998). The effects of noise on the intelligibility of foreign-accented speech. Studies in Second Language Acquisition, v. 20, n. 2, p. 139-154.

MUNRO, M. J.; DERWING, T. M. (1995). Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. Language learning, v. 45, n. 1, p. 73-97.

MUNRO, M. J.; DERWING, T. M. (2015). Intelligibility in research and practice: Teaching priorities. In: REED, M.; LEVIS, J. M. (Eds.). The Handbook of English Pronunciation. Wiley Online Library. p. 375-396.

NAGLE, C. L.; HUENSCH, A. (2020). Expanding the scope of L2 intelligibility research: Intelligibility, comprehensibility, and accentedness in L2 Spanish. Journal of Second Language Pronunciation. 6.

NERI, A.; CUCCHIARINI, C.; STRIK, H. (2006). ASR-based corrective feedback on pronunciation: Does it really work? INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP, v. 4, n. May 2014, p. 1982-1985.

NERI, A.; CUCCHIARINI, C.; STRIK, H. (2008). The effectiveness of computer-based speech corrective feedback for improving segmental quality in L2 Dutch. ReCALL, v. 20, n. 2, p. 225-243.

ROGERSON-REVELL, P. M. (2021). Computer-Assisted Pronunciation Training (CAPT): Current Issues and Future Directions. RELC Journal, v. 52, n. 1, p. 189-205.

VOICENOTEBOOK. (2021). Voice Notebook Homepage. Available at: <https://voicenotebook.com>. Accessed: Nov, 1st 2021.

WEINBERGER, S. (2015). Speech Accent Archive. George Mason University. Available at: <http://accent.gmu.edu>. Accessed: Nov, 1st 2021.

YOSHIDA, M. T. (2018). Choosing technology tools to meet pronunciation teaching and learning goals. The CATESOL Journal, v. 30, n. 1, p. 195-212.

YU, D.; DENG, L. (2015). Automatic Speech Recognition A Deep Learning Approach. London: Springer.

ZIELINSKI, B. W. (2008). The listener: No longer the silent partner in reduced intelligibility. System, v. 36, n. 1, p. 69-84.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2022 Trabalhos em Linguística Aplicada

Downloads

Download data is not yet available.