Machine Learning for Emotion Detection in Speech Audios

ece 228 machine learning for physical applications n.w

1 / 18

Embed Share

Explore how machine learning techniques are used to analyze and predict the emotions in speech audios by implementing noise reduction methods. The project focuses on efficiently detecting the emotions of speakers by considering both content and emotional intensity. Various literature surveys and methods such as Decision Trees, KNN, and CNN are analyzed to address the noise reduction problem. Traditional ML/DL approaches and text mining techniques are incorporated to enhance the model's robustness. The dataset used is the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), consisting of 1440 audios with different emotional intensities and labels. Feature extraction methods like noise addition and reduction are applied to optimize the model's performance while considering the impact of noise on overfitting.

mbaco Follow

Uploaded on Jun 05, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

ECE 228 - Machine Learning for Physical Applications

1. Background Analyze and predict the emotion of speech audios using noise reduction. Content and emotional intensity are sometimes misleading. Detect the emotion of speakers in an efficient way. ECE 228 - Machine Learning for Physical Applications

2. Literature Survey Mechod people used to do noise reduction LMS algorithms: An adaptive filter Use limited hardware resources Kalman filtering: An efficient recursive filter Most widely used Spectral subtraction Deal with addictive noise Simple algorithms and small computational cost ECE 228 - Machine Learning for Physical Applications

2. Literature Survey Method people used to solve this problem Decision Tree KNN CNN ECE 228 - Machine Learning for Physical Applications

3. ML/DL Related Traditional approaches ML/DL approaches Text Mining Learn every pattern Emotional intensity Pre-trained Human ways More efficient In our project, we add noise reduction to make our model more robust. ECE 228 - Machine Learning for Physical Applications

4. Dataset The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)" by Livingstone & Russo 60 trials per actor x (12 female + 12 male actors) = 1440 audios 2 Emotional intensities - Normal & Strong 8 Labels (emotions) - Neutral, Calm, Happy, Sad, Angry, Fearful, Disgust, Surprised ECE 228 - Machine Learning for Physical Applications

5. Feature Extraction--Noise addition Original Signal:Voice contains little noise. Signal after process: Voice contains additive gaussian noise. Original signal Signal with additive gaussian noise ECE 228 - Machine Learning for Physical Applications

5. Feature Extraction--Noise reduction Reasons to do noise reduction: 1) Noise have a significant impact on the overall performance of machine learning model. 2) Noise increase the complexity of model, which could lead to overfitting. ECE 228 - Machine Learning for Physical Applications

5. Feature Extraction--Noise reduction In our project, we apply the following methods to do noise reduction: 1) Design low pass filter to eliminate high frequency noise 2) Calculate the mean value of frequency spectrum of the first 10 frame of the data and take this value as noise frequency. 3) Compare each frame with the noise frequency value. Then update both the frame and the noise value. 4) Get the data after noise reduction. ECE 228 - Machine Learning for Physical Applications

5. Feature Extraction--Noise reduction Result of noise reduction: Signal with additive gaussian noise Signal after noise reduction ECE 228 - Machine Learning for Physical Applications

5. Feature Extraction In our project, we extract MFCC (Mel-scale Frequency Cepstral Coefficients) as our feature. To calculate the MFCC, we have to: 1) Frame the signal into short frames; 2) Calculate the FFT (Fast Fourier Transform) of each audio signal frame and get the spectrum; 3) Apply the Mel filter on the signal and get the Mel spectrum; 4) Take the logarithm and calculate DCT (Discrete Fourier Transform) of this signal to get MFCC. With MFCC, we can explore the emotions within audio signal and using some methods to classify them. ECE 228 - Machine Learning for Physical Applications

6. Models - Baseline KNN Decision Tree K = 3 Criterion = entropy ~35% accuracy on test ~30% accuracy on test Lazy learner Unstable & Overfitting ECE 228 - Machine Learning for Physical Applications

6. Models - CNN & LSTM Layer Filter Kernel_size Padding Activation Conv2d 32 (4,10) same relu CNN Conv2d 32 (4,10) same relu After Conv2d in the table, we add two Dense Layers Conv2d 32 (4,10) same relu Conv2d 32 (4,10) same relu Layer Filter Kernel_size Padding Activation Conv2d 32 (4,10) same relu CNN-LSTM Conv2d 32 (4,10) same relu After Conv2d in the table, we add LSTM layer and Dense Layer Conv2d 32 (4,10) same relu Conv2d 32 (4,10) same relu ECE 228 - Machine Learning for Physical Applications

7. Models - CNN & LSTM Original dataset Noise_reduction add_noise add_noise and reduce it CNN CNN_LSTM ECE 228 - Machine Learning for Physical Applications

7. Models - Pruned CNN Customize Dense and Conv2D layers to adopt pruning neurons. More efficient in practice Reduce # of parameters by dropping weights that have lower absolute value ECE 228 - Machine Learning for Physical Applications

8. Future We d like to practice some vision application models like VGG-16 or MobileNets on our data to see if they are good choices on predicting MFCC data. ECE 228 - Machine Learning for Physical Applications

9. References Pruned neurons in CNN - arXiv:1506.02626 [cs.NE]. Learning both Weights and Connections for Efficient Neural Networks by Song Han, Jeff Pool, John Tran, William J. Dally ECE 228 - Machine Learning for Physical Applications

Thank You ECE 228 - Machine Learning for Physical Applications

Machine Learning for Emotion Detection in Speech Audios

Download Presentation

Presentation Transcript

Related

More Related Content