
BERT and Deep Learning for Persian Text Sentiment Analysis
Explore how BERT and Deep Learning techniques are leveraged for sentiment analysis on Persian texts. The project discusses problem statements, input and output samples, challenges like Named Entity Recognition, and past solutions such as Skip-gram models and LSTM. The solution proposed involves Bidirectional Encoder Representations from Transformers for pre-training NLP contextual representations.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Sentiment analysis using BERT (pre-training language representations) and Deep Learning on Persian texts Soroush Karimi Fatemeh Sadat Shahrabadi
1 problem statement
Sentiment analysis and natural language processing Brand monitoring Competitive research Flame detection and customer service prioritization Product analysis Market research and insights into industry trends Workforce analytics/employee engagement monitoring
Input Sample _x000D_ x000D_ _x000D_ " 3 " . " . " a7 2018 " "
Output Sample [(' _x000D_ x000D_ _x000D_ ', 3 array([-3.9536180e-03, -5.5351005e+00], dtype=float32), 'Negative'), . (' .', array([-2.4673278e-03, -6.0058746e+00], dtype=float32), 'Negative'), (' a7 2018 ', array([-2.2144477 , -0.11565089], dtype=float32), 'Positive')
2 Challenges
Challenges Named Entity Recognition Anaphora Resolution Parsing Sarcasm poor spelling, poor punctuation, poor grammar
3 Past solutions
Skip-gram model For learning vector representations of words Unsupervised 150726 unlabeled sentences
4 Our solution!
Bidirectional Encoder Representations from Transformers first unsupervised, deeply bidirectional system for pre-training NLP contextual Pre-trained representation
BERT Approach mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words Input: the man went to the [MASK1] . he bought a [MASK2] of milk. Labels: [MASK1] = store; [MASK2] = gallon
5 Past Results
Computing results predicted as negative Predicted as positive Negative TN FP positive FN TP ?? Precision = ??+?? ?? Recall = ??+?? F-score =2 ?????? ????????? ??????+?????????
Computing results Confusion matrix for NBSVM-bi: predicted as negative Predicted as positive Negative 123 262 Confusion matrix for CNN: predicted as negative Predicted as positive positive 51 4568 Confusion matrix for Bidirectional-LSTM: Negative 201 184 predicted as negative Predicted as positive positive 139 4480 Negative 201 184 positive 170 4449
Final results Approach Precision Recall F-score NBSVM-bi 70.7 31.9 44.0 Bidirectional-LSTM 54.2 35.2 53.2 CNN 59.1 52.2 55.4
6 Our Results
Computing results Confusion matrix for BERT (unbalanced data for fine-tuning and testing): predicted as negative Predicted as positive Negative 188 235 positive 109 4472 Confusion matrix for BERT (balanced data for fine-tuning and unbalanced data for testing): predicted as negative Predicted as positive Negative 415 8 positive 849 3732
Computing results Confusion matrix for BERT (positive data twice the negative data for fine-tuning and unbalanced data for testing): predicted as negative Predicted as positive Negative 378 45 positive 390 4191 Confusion matrix for BERT (balanced data for fine-tuning by increasing negative data and unbalanced data for testing): predicted as negative Predicted as positive Negative 267 205 positive 480 5032
Final results Approach Precision 0.44 Recall 0.63 F-score 0.51 BERT (unbalanced data for fine-tuning and testing): BERT (balanced data for fine-tuning and unbalanced data for testing) BERT (positive data twice the negative data for fine- tuning and unbalanced data for testing) 0.32 0.98 0.48 0.49 0.89 0.63 BERT (balanced data for fine-tuning by increasing negative documents and unbalanced data for testing) 0.35 0.56 0.43
7 Compare Results
Compare results compare 70 60 50 40 30 20 10 0 3 2 1 our result past result
Thanks for your Thanks for your attention attention