top of page

This site was designed with the

website builder. Create your website today.Start Now

TECHBLOG.AI.BCU.AMAL

Search

Twitter Sentiment Analysis using NLP and Machine Learning

amalpappalilramesh
May 16, 2022
3 min read

1. Introduction

The main objective of this blog is to train a machine learning model and predict the sentiment (positive/ negative) for a series of twitter tweets using the best accurate model. High Level Steps are shown below:

Modules Used:

Here for developing the same , we would need to import some libraries . Here for this blog , I am using a google collab notebook - which will use Python as the programming language .

Hence the step 1 is to import the necessary libraries ;

2. Load the Dataset:

Now let us have a look into the dataset in use . The following figure gives a description of the dataset . Here , my dataset is in a CSV file and the following code loads the dataset to the dataframe in python . The dataset is a collection of real twitter tweets .

The dataset has the following columns

Here is the sample data from the dataset

Now let us have a quick look at the distribution of tweets which are labelled as negative and positive respectively :

3. Data Preprocessing:

This section will deal with preprocessing the data . There are few methods used to preprocess the data before we pass it to the NLP engines.

1. Check NAN - Check the Null values in data set and drop them all . the following code does it

2. Lowercasing - Convert all tweets to lowercase

3. Removal of repeating-characters - It has been identified that there are several repeating ccharecters in the dataset . We need to remove them all

4. Removal of URL’s/Links - The tweets also contained several irrelevant links , URL's which we wont be needing for the prediction . hence we are removing them too

5. Cleaning the Numbers

6. Perform Tokenization - with TweetTokenizer

One of the important step here is to perform tokenization of the tweets. here we will be using a Tweet tokenizer method as shown below :

Insert the tokenized text to Tokenized_Text column

7. Lemmatization

After the tokenization, another major tweet preprocessing step is lemmatization . The following block of code performs lemmatization :

Now let us Plot the words in negative statements

8. Text feature extraction and model Generation

Once we are done with data cleaning - we can go ahead with Model generation . The following steps need to be performed

1. Split the data to training-test

We create 3 train and test set pairs using the columns: 'Text', 'Tokenised_Text' and Lemmatised_Text'.

4. MODEL SUMMARY

· A total of 11 Models tried out

· Different combinations of Preprocessing performed

· Highest accuracy found for model M8 which is a SVM model with 87% accuracy

Summary :

· Out of the 11 different models – SVM with following combination performed with highest accuracy

· SVM takes a probabilistic approach and works on the geometric interpretation of the problems.

· The model is independent of dimensions

Test Predictions on the Test Dataset:

Now that we have identified the best model , we will apply this model to predict the Sentiments for the test dataset .

The results show that

èSVMs consistently achieve good performance on text categorization tasks,

èSVMs outperform existing methods substantially and signicantly.

è SVMs eliminate the need for feature selection,

è SVMs is more robust when compared to other methods

5. Final Prediction:

Copy the predictions to Sentiment Column in the dataset and export it as CSV.

The Sentiment column provides the predicted sentiment by the model.

6. REFERENCE:

https://realpython.com/sentiment-analysis-python/

https://medium.com/analytics-vidhya/na%C3%AFve-bayes-vs-svm-for-text-classification-c63478229c33

https://github.com/HHansi/Applied-AI-Course/tree/main/NLP

Recent Posts

Diabetic Prediction- Using Deep Learning Neural Networks

Diabetic Prediction- Using Deep Learning Neural Networks

Machine Learning - Used Cars Dataset- Predicting Used car prizes and Models comparison

Machine Learning - Used Cars Dataset- Predicting Used car prizes and Models comparison

Machine Learning -CLUSTERING on Used Cars Data

Machine Learning -CLUSTERING on Used Cars Data

Comments

Post: Blog2_Post

bottom of page