
Voice Biometry Standardization and Implementation Guide
Explore the comprehensive Voice Biometry Standardization and Implementation initiative, offering a detailed manual and user guide for a speaker recognition demo package. Learn about i-vectors, speaker identity and speech content analysis, standardization objectives, feature extraction methods, universal background modeling, and i-vector extractor training.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
VBS Documentation and Implementation The full standard initiative is located at www.voicebiometry.org Quick description Standard manual with detailed description and a quick user guide to The reference demo package Contains full speaker-recognition (demo) pipeline
i-vectors Information-rich Low-dimensional Fixed-length Vector of real numbers Based on statistical model Easy to compare Easy to store Not recoverable to speech Dehak, N., et al., Support Vector Machines versus Fast Scoring in the Low-Dimensional Total Variability Space for Speaker Verification In Proc Interspeech 2009, Brighton, UK, September 2009
SPEAKER IDENTITY AND SPEECH CONTENT SPEAKER IDENTITY NO SPEECH CONTENT i-vector I-VECTOR EXTRACTOR (SITE 1) AUDIO 1 COMPARISON SCORE I-VECTOR EXTRACTOR (SITE 2) AUDIO 2 i-vector
Standardization objectives VOICE ACTIVITY DETECTION COLLECTION OF STATISTICS POST- FEATURE EXTRACTION I-VECTOR EXTRACTION COMPARISON AUDIO PROCESSING ALGORITHMS PARAMETERS GMM UBM T MATRIX STANDARDIZED Acoustic feature extraction i-vector extraction algorithm i-vector extraction parameters (GMM parameters, i-vector extractor parameters) The data exchange formats (tuned for telephone speech) NOT STANDARDIZED
Feature Extraction Pre-emphasis 25ms windowing with 10ms shift Hamming window 24 Mel filter-banks in the range of 125 3800 Hz 19-dimensional MFCC coefficients + C0 Delta + Double-delta Short-time Cepstral Mean and Variance Normalization Over 3 second window
Universal Background Model 2048 Gaussian mixture components Diagonal covariances Trained on 1156 hours of the NIST SRE 2004-2008 data (gender independent) Trained using gradual Gaussian splitting with 10 EM steps in each split The UBM is used to extract the sufficient statistics for the i-vector extractor and to normalize (whiten) these statistics
i-vector extractor Trained on the same data as UBM + Switchboard 2 (phases 2 and 3) + Fisher English (phases 1 and 2) 600 dimensional 10 iterations of EM and MD steps
Reference implementation Feature extraction i-vector extraction Dimension reduction Mean/var norm Length norm ENROLL AUDIO VAD PLDA scoring SCORE TEST AUDIO Feature extraction i-vector extraction Dimension reduction Mean/var norm Length norm VAD
Reference implementation Python code Readability and ease of understanding Extensibility Standard Python packages Numpy + Scipy
System perfofmance NIST SRE 2010, cond 5, female New DCF 0.3877 Old DCF 0.1142 EER 2.26 2010/cond5/f
i-vector compatibility i-vectors produced by one system are incompatible with those generated by a different system We run experiments for training an i-vector transformation to migrate i- vectors of one systems to another Take it as an invitation for tomorrow s talk: Migrating i-vectors Between Speaker Recognition Systems Using Regression Neural Networks