GitXplorerGitXplorer
g

Covid_tweets_analysis

public
0 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
7758d7879b35b38801513e1d058eb21c8ca9433a

EDA

ggaetanlop committed 4 years ago
Verified
94c30aa921f0e7ca271fd1baa6d5aadb6019e1d2

Initial commit

ggaetanlop committed 4 years ago

README

The README file for this repository.

Covid_tweets_analysis

In this project, I wanted to explore how people feel at the "beginning" of the pandemic. I want to do that, in order to compare the feeling of people at the beginning of the pandemic and now after more than 6 months. I think it could be a very interesting study. Therefore, in this repo, I decided to analyze tweets about Covid between the end of February and March 2020. I will do the same thing with tweets between September and October 2020 soon.

  • Text preprocessing: removed https link, removed punctuations, lowered the text, removed stopwords.
  • Performed a sentiment analysis of the different tweets.
  • Looked at the most common words (N_grams).
  • Checked the most popular hashtags.
  • Used sentiment polarity in order to find out how people feel regarding the pandemic.
  • Tools used: pandas, numpy, nltk, scikit-learn, matplotlib, seaborn

Results

Three things stricked me the most:

  • The most impressive thing is that there are more positive tweets than negative ones. In my opinion, this is a really good thing, and that explains the general positive good vibe during the quarantine period. It will be interesting to check if in September, there are still more positive tweets. I do not think it will be the case, since people are more and more annoyed by the situation.
  • Secondly, if I had to summarize the most popular topics in only two categories, I would say: stores & supermarkets (food ...), people (employers, workers). This is cool to see that many people are worrying about people that still has top go to work outside during quarantine.
  • Finally, the most popular hashtags are: Donald Trump, Tesco, Amazon, Boris Johnson and Youtube.

About the Data

Tweets have been pulled from Twitter. It contains approximately 3700 tweets that have been classified manually. Apart from the Original tweets, the dataset contains information about the Location and the tweet date. One import thing to note is that the tweets are between February to March 2020. I decided to use this old dataset because I want to compare at the "beginning" of the pandemic and now after more than 6 months. I will do this second analysis soon.

Data visualization

alt text alt text alt text

Code and Resources Used

Python Version: 3.7

Link of the dataset: https://www.kaggle.com/datatattle/covid-19-nlp-text-classification

Interesting notebook: https://www.kaggle.com/satanizer/covid-19-tweets-analysis