Enhancing Sentiment Analysis Algorithms: Project Results Revealed

mid term report juweek adolphe zhaoyu li ressi n.w

1 / 15

Embed Share

Dive into the project's mid-term report outlining the learning experiences, methodologies, issues, and detailed results of machine learning and sentiment analysis algorithms. Explore the process of improving algorithm accuracy and comparing different strategies for feature extraction, all aimed at achieving better results. Uncover the challenges faced and the outcomes of implementing various algorithms.

jveron Follow

Uploaded on Apr 03, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Mid-Term Report Juweek Adolphe Zhaoyu Li Ressi Miranda Shang Dr.

Outline (Edited) Learning Experience o Machine Learning o Sentiment Analysis Project Results

Learning Experience Machine Learning Algorithms o Naive Bayes (probability) o Support Vector Machine (SVM) o Stochastic Gradient Descent

Learning Experience Sentiment Analysis o classify text into a polarity Text Classification into polarity categories o Naive Bayes: Bernoulli o Naive Bayes: Multinomial o Stochastic Gradient Descent o TF-IDF (Term frequency - inverse document frequency) o Chi-Square Test

Why? Improve the accuracy of the algorithms o Even by a little bit Hope to get better results

Scheme/Project Let s make a comparison between the different algorithm Comparing the algorithms accuracies Changing up features extraction

Methodology Extracting features Make a feature vector Select features Remove features Train Algorithm Test Algorithm

Issues Long time to train and cross-validate different Pipelines Formatting of code prevented inclusion of alternative classifiers (KNearestNeighbors, DecisionTree) Data set format might not be reliable (already processed) Accuracy rates lower than expected

Results

Results No Chi-Squared Tfidf/Bi Tfidf/Uni Count/Bi Count/Uni Hash/Bi Hash/Uni MultinomialNB 0.550637716 0.550101526 0.55132977 0.550564977 0.548096016 0.549712898 BernoulliNB 0.550633557 0.550633557 0.550633557 0.550633557 0.548104329 0.548104329 SVM 0.51090564 0.51090564 0.51090564 0.51090564 0.51090564 0.51090564 Chi-Squared Implemented Tfidf/Bi Tfidf/Uni Count/Bi Count/Uni Hash/Bi Hash/Uni MultinomialNB 0.541179586 0.540986305 0.542239491 0.541505867 0.548867048 0.549660941 BernoulliNB 0.541210758 0.541210758 0.541809294 0.541809294 0.550138938 0.550138938 SVM 0.51090564 0.51090564 0.51090564 0.51090564 0.51090564 0.51090564

Results No Chi-Squared Tfidf/Bi Tfidf/Uni Count/Bi Count/Uni Hash/Bi Hash/Uni MultinomialNB 0.550637716 0.550101526 0.55132977 0.550564977 0.548096016 0.549712898 BernoulliNB 0.550633557 0.550633557 0.550633557 0.550633557 0.548104329 0.548104329 SVM 0.51090564 0.51090564 0.51090564 0.51090564 0.51090564 0.51090564 Chi-Squared Implemented Tfidf/Bi Tfidf/Uni Count/Bi Count/Uni Hash/Bi Hash/Uni MultinomialNB 0.541179586 0.540986305 0.542239491 0.541505867 0.548867048 0.549660941 BernoulliNB 0.541210758 0.541210758 0.541809294 0.541809294 0.550138938 0.550138938 SVM 0.51090564 0.51090564 0.51090564 0.51090564 0.51090564 0.51090564

Findings MultinomialNB and BernoulliNB dramatically outperformed SGD Chi-squared generally reduces accuracy (30%) Highest overall was about Count/Multinomial/Uni+Bi No consistent correlation between difference in accuracy and usage of unigrams vs bigrams

What does this mean? We do not know Classifier can stand to be more accurate Experiments with additional datasets/algorithms have to be completed first Overall goal to scale to Big Data level

Future Work Figure out what makes our classifier less accurate from the standard No improvement Moving away from the previous project o Previous projects were reinventing the wheel Implementing Naive Bayes in MapReduce

Demo of Text Classification

Enhancing Sentiment Analysis Algorithms: Project Results Revealed

Download Presentation

Presentation Transcript

Related

More Related Content