Prosody Normalization Techniques and Analysis Overview

prosody n.w
1 / 20
Embed
Share

Explore the normalization methods and analysis of prosodic features in speech signals, including pitch histograms, frame-level features, and perceptual correlates for improved understanding and modeling. Learn about z-normalization, octaving correction, and more for effective prosody processing.

  • Prosody
  • Normalization
  • Analysis
  • Speech
  • Features

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Prosody Lecture 16: Normalization Nigel G. Ward, University of Texas at El Paso Gina-Anne Levow, University of Washington Tutorial presented at ACL 2021

  2. Prosody Lecture 16: Normalization Because people differ!

  3. Overview mid-level features speech signal insight, model, etc. frame-level features normalized features

  4. Is the Pitch Here High? 190 Hz pitch

  5. Pitch Histograms count pitch (log Hz)

  6. Pitch Histograms count pitch (log Hz) count pitch (log Hz)

  7. Pitch Histograms count pitch (log Hz) count pitch (log Hz)

  8. Pitch Histograms count pitch (log Hz) count pitch (log Hz)

  9. Pitch Histograms count pitch (log Hz) count pitch (log Hz)

  10. Pitch Histograms count pitch (log Hz) count pitch (log Hz)

  11. Pitch Histograms count pitch (log Hz) Common Normalization Methods z-normalize Identify the tied Gaussians, then z-normalize and correct octaving* Just use percentiles, since robust * Sonmez et al, Eurospeech 1997

  12. Frame-level Features Percept Acoustic Correlate log F0(log Hz), semitones Intensity (dB) Normalization Pitch z-normalization, percentiles Loudness Voicing Breathiness Periodicity Low HNR, low CPPS Cepstral distance ? ? Reduction ?

  13. Frame-level Features Percept Acoustic Correlate log F0(log Hz), semitones Intensity (dB) Normalization Pitch z-normalization, percentiles Loudness ? Voicing Breathiness Periodicity Low HNR, low CPPS Cepstral distance ? ? Reduction ?

  14. Intensity Histogram Does a certain speech frame count as loud ? count intensity Normalize energy so that the Gaussians centers are at 0 (silence) and 1 (speech).

  15. Summary Normalization is tricky Normalization is imperfect Normalize carefully ! ! Elizabeth E. Shriberg (p.c.)

  16. Contents Introduction 14. Intro to Features 15. Using Pitch Trackers 16. Normalization 16. Aggregation 17. Machine Learning 18. Speech Recognition Production, Perception Classic Linguistic Prosody Technology and Techniques Para. & Prag. Functions Speech Synthesis and Dialog Perspectives

  17. Contents Introduction 14. Intro to Features 15. Using Pitch Trackers 16. Normalization 17. Aggregation 18. Machine Learning 19. Speech Recognition Production, Perception Classic Linguistic Prosody Technology and Techniques Para. & Prag. Functions Speech Synthesis and Dialog Perspectives

  18. Popular Software Praat SRI s set OpenSmile Midlevel CoVarep Surfboard

  19. Pitch Scales linear log ? Mel* percentiles ? pitch 5 levels ? {H, L} ? F0 Matches perception Robust to outliers Supports averaging Handles speaker range differences . *Mermelstein (1976)

More Related Content