Talk

LangAI セミナー #3 開催 “Scaling Multilingual Speech Recognition: From a Handful to Thousands of Languages”, Dr. Shinji Watanabe, Carnegie Mellon University.

Dr. Shinji Watanabe, Carnegie Mellon University をお招きして、セミナーを開催いたします。

参加希望の方は、下記Googleフォームにて申し込みください。申込は東北大学学内限定です。(tohoku.ac.jpの学内アドレスよりお申し込みください)

We are pleased to invite Dr.Shinji Watanabe(Carnegie Mellon University) to join us for a seminar.If you would like to attend, please register using the Google Form below.

開催日時/Date2025年7月15日 (火) 16:30~17:30
講演題目/TitleScaling Multilingual Speech Recognition: From a Handful to Thousands of Languages
現地場所/Location東北大学 川内キャンパス マルチメディア教育研究棟(A05) 2F M206講義室
【マルチメディア教育研究棟】
https://www.tohoku.ac.jp/japanese/profile/campus/01/kawauchi/areaa.html
地図中のA05の建物.▲が建物入り口
【マルチメディアホール各階案内】
https://www2.he.tohoku.ac.jp/center/mm_intro/mm_intro.html
対象者/Target Audience学内の研究者、学生、関係者(東北大学学内限定)
Researchers, students, and related persons on campus (Tohoku University campus only)
備考/Note※当日現地での参加登録も可能です。
*You can also register on-site on the day of the event.

Title:

Scaling Multilingual Speech Recognition: From a Handful to Thousands of Languages

Abstract:

This presentation outlines our research journey in advancing multilingual speech recognition. Our first end-to-end multilingual ASR system, developed in 2017, supported just 10 languages. By leveraging paired speech and transcription data, we later scaled the approach to cover approximately 100 languages. To facilitate broader research, we introduced Multilingual SUPERB, a benchmark built on these languages. However, scaling ASR to encompass all 7,000+ languages worldwide remains a major challenge due to the lack of such paired data for most languages. To address this gap, the ASR2K project proposed a universal phone-based ASR model, integrating lexicons and language models—marking the first step toward recognizing speech across thousands of languages. More recently, self-supervised learning (SSL) approaches have made it possible to incorporate additional languages, at least in the pre-training phase. Despite these advances, data imbalance and bias remain persistent challenges. In this talk, we present our latest work on scaling model sizes—up to 18 billion parameters—as a strategy to mitigate such biases. Although full coverage of thousands of languages is still out of reach, we hope this talk will spark further efforts in the community toward addressing this critical and long-standing problem.

Speaker :

Dr. Shinji Watanabe, Associate Professor at Carnegie Mellon University

Short bio:

Shinji Watanabe is an Associate Professor at Carnegie Mellon University, Pittsburgh, PA. He received his B.S., M.S., and Ph.D. (Dr. Eng.) degrees from Waseda University, Tokyo, Japan. He was a research scientist at NTT Communication Science Laboratories, Kyoto, Japan, from 2001 to 2011, a visiting scholar at Georgia Institute of Technology, Atlanta, GA, in 2009, and a senior principal research scientist at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA USA from 2012 to 2017. Before Carnegie Mellon University, he was an associate research professor at Johns Hopkins University, Baltimore, MD, USA, from 2017 to 2020. His research interests include automatic speech recognition, speech enhancement, spoken language understanding, and machine learning for speech and language processing. He has published over 500 papers in peer-reviewed journals and conferences and received several awards, including the best paper award from ISCA Interspeech in 2024. He is a Senior Area Editor of the IEEE Transactions on Audio Speech and Language Processing. He was/has been a member of several technical committees, including the APSIPA Speech, Language, and Audio Technical Committee (SLA), IEEE Signal Processing Society Speech and Language Technical Committee (SLTC), and Machine Learning for Signal Processing Technical Committee (MLSP). He is an IEEE and ISCA Fellow.

関連記事

TOP