
Prosody Normalization Techniques and Analysis Overview
Explore the normalization methods and analysis of prosodic features in speech signals, including pitch histograms, frame-level features, and perceptual correlates for improved understanding and modeling. Learn about z-normalization, octaving correction, and more for effective prosody processing.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Prosody Lecture 16: Normalization Nigel G. Ward, University of Texas at El Paso Gina-Anne Levow, University of Washington Tutorial presented at ACL 2021
Prosody Lecture 16: Normalization Because people differ!
Overview mid-level features speech signal insight, model, etc. frame-level features normalized features
Is the Pitch Here High? 190 Hz pitch
Pitch Histograms count pitch (log Hz)
Pitch Histograms count pitch (log Hz) count pitch (log Hz)
Pitch Histograms count pitch (log Hz) count pitch (log Hz)
Pitch Histograms count pitch (log Hz) count pitch (log Hz)
Pitch Histograms count pitch (log Hz) count pitch (log Hz)
Pitch Histograms count pitch (log Hz) count pitch (log Hz)
Pitch Histograms count pitch (log Hz) Common Normalization Methods z-normalize Identify the tied Gaussians, then z-normalize and correct octaving* Just use percentiles, since robust * Sonmez et al, Eurospeech 1997
Frame-level Features Percept Acoustic Correlate log F0(log Hz), semitones Intensity (dB) Normalization Pitch z-normalization, percentiles Loudness Voicing Breathiness Periodicity Low HNR, low CPPS Cepstral distance ? ? Reduction ?
Frame-level Features Percept Acoustic Correlate log F0(log Hz), semitones Intensity (dB) Normalization Pitch z-normalization, percentiles Loudness ? Voicing Breathiness Periodicity Low HNR, low CPPS Cepstral distance ? ? Reduction ?
Intensity Histogram Does a certain speech frame count as loud ? count intensity Normalize energy so that the Gaussians centers are at 0 (silence) and 1 (speech).
Summary Normalization is tricky Normalization is imperfect Normalize carefully ! ! Elizabeth E. Shriberg (p.c.)
Contents Introduction 14. Intro to Features 15. Using Pitch Trackers 16. Normalization 16. Aggregation 17. Machine Learning 18. Speech Recognition Production, Perception Classic Linguistic Prosody Technology and Techniques Para. & Prag. Functions Speech Synthesis and Dialog Perspectives
Contents Introduction 14. Intro to Features 15. Using Pitch Trackers 16. Normalization 17. Aggregation 18. Machine Learning 19. Speech Recognition Production, Perception Classic Linguistic Prosody Technology and Techniques Para. & Prag. Functions Speech Synthesis and Dialog Perspectives
Popular Software Praat SRI s set OpenSmile Midlevel CoVarep Surfboard
Pitch Scales linear log ? Mel* percentiles ? pitch 5 levels ? {H, L} ? F0 Matches perception Robust to outliers Supports averaging Handles speaker range differences . *Mermelstein (1976)