Scalable Multilingual Keyword Spotting Models for Efficient Speech Recognition

locale encoding for scalable multilingual keyword n.w

1 / 6

Embed Share

Discover how production-grade keyword-spotting systems are revolutionizing speech recognition by training models to recognize keywords in multiple languages efficiently. Explore the methods used, experimental data, and results achieved with innovative approaches for information sharing and cross-language training.

sume Follow

Uploaded on Jun 09, 2025 | 2 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Locale Encoding For Scalable Multilingual Keyword Spotting Models Pai Zhu, Hyun Jin Park, Alex Park, Angelo Scorza Scarpati, Ignacio Lopez Moreno Google LLC, Mountain View, CA, U.S.A

Introduction Production-grade keyword-spotting (KWS) systems are trained to recognize keywords from a continuous stream of speech Previously focused on noise robustness reducing dependency on data volume and label quality minimizing computing cost improving detection accuracy single specific language develop a monolingual KWS model, and repeat the same process for other languages set of N locale specific models given N locales. It can serve as a simple baseline three new approaches for sharing information between locales a fully universal model single model trained with union of data from all locales Concat FiLM

Method

Method P???= ???? ? ?,? = ? ? + ?

Experimental 1.2 billion anonymized utterances Positive data keyword phrase Ok Google or Hey Google Negative data tactile (push-button) Train Positive (838M) Negative (435M) Evaluation Positive (6M) Negative (5k hours) Locales DA-DK(Danish), DE-DE (German), ES-ES (Spanish), FR-FR (French), IT-IT (Italian), KO-KR (Korean), NL-NL (Dutch), PT-BR (Brazilian Portuguese), SV-SE (Swedish), TH-TH (Thai) Target Keywords Localized versions of OK Google and Hey Google Loss Cross entropy

Results

Scalable Multilingual Keyword Spotting Models for Efficient Speech Recognition

Download Presentation

Presentation Transcript

Related

More Related Content