GitXplorerGitXplorer
g

Covid_tweets_analysis

public
0 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
daf0f788197664bed0fa10b267c2317ea86c2713

Update README.md

ggaetanlop committed 4 years ago
Verified
3fdc78c97e2f2ac647bfdee5eefab34c6e3192e4

Update README.md

ggaetanlop committed 4 years ago
Verified
1f0080ff4f53911a59eacf2f4828c8afa9ea0bfe

Update README.md

ggaetanlop committed 4 years ago
Verified
f0310f5270b82f02fac59ea69368127955e2c88d

Update README.md

ggaetanlop committed 4 years ago
Verified
a16764fbf23348f401d8129bfef6cca83d029c50

Update README.md

ggaetanlop committed 4 years ago
Verified
26e7ed9c53c2b5e2a51b583b434e096e35e20341

Update README.md

ggaetanlop committed 4 years ago

README

The README file for this repository.

Covid_tweets_analysis

In this project, I wanted to explore how people feel at the "beginning" of the pandemic. I want to do that, in order to compare the feeling of people at the beginning of the pandemic and now after more than 6 months. I think it could be a very interesting study. Therefore, in this repo, I decided to analyze tweets about Covid between the end of February and March 2020. I will do the same thing with tweets between September and October 2020 soon.

  • Text preprocessing: removed https link, removed punctuations, lowered the text, removed stopwords.
  • Performed a sentiment analysis of the different tweets.
  • Looked at the most common words (N_grams).
  • Checked the most popular hashtags.
  • Used sentiment polarity in order to find out how people feel regarding the pandemic.
  • Tools used: pandas, numpy, nltk, scikit-learn, matplotlib, seaborn

Results

Three things stricked me the most:

  • The most impressive thing is that there are more positive tweets than negative ones. In my opinion, this is a really good thing, and that explains the general positive good vibe during the quarantine period. It will be interesting to check if in September, there are still more positive tweets. I do not think it will be the case, since people are more and more annoyed by the situation.
  • Secondly, if I had to summarize the most popular topics in only two categories, I would say: stores & supermarkets (food ...), people (employers, workers). This is cool to see that many people are worrying about people that still has top go to work outside during quarantine.
  • Finally, the most popular hashtags are: Donald Trump, Tesco, Amazon, Boris Johnson and Youtube.

About the Data

Tweets have been pulled from Twitter. It contains approximately 3700 tweets that have been classified manually. Apart from the Original tweets, the dataset contains information about the Location and the tweet date. One import thing to note is that the tweets are between February to March 2020. I decided to use this old dataset because I want to compare at the "beginning" of the pandemic and now after more than 6 months. I will do this second analysis soon.

Data visualization

alt text alt text alt text

Code and Resources Used

Python Version: 3.7

Link of the dataset: https://www.kaggle.com/datatattle/covid-19-nlp-text-classification

Interesting notebook: https://www.kaggle.com/satanizer/covid-19-tweets-analysis