This repo contains all the resources I used for my experiment "When do I listen to Bad Bunny?"
Bad Bunny is an urban and Puerto Rican singer who has become a global star in the last few years. As the title indicates, in this project, I'm using Spotify data I've been collecting since late 2017 to find out when I listen to his music. To solve this mystery, I did a data analysis using Python and R, which includes data exploration, visualizations, time series, and anomaly detection.
The repo has the following structure:
-
bqWriter/
: the script I used to read my Spotify account's recently played track and write the results to BigQuery. -
data/
: the datasets I used for the experiment.-
df.csv
: the dataset containing all the Bad Bunny data. Each row represents a song I listened to. -
results.csv
: the outputs from the anomaly detection algorithm (isolation forest). The first column,predictions
is the prediction (-1
= anomaly,1
= non-anomalous), andscores
is the prediction score (the lowest the score, the more anomalous). -
timeseries_data.csv
: the data I used to train the time series model. The first column,ds
, is the date, and the second column,y
, is the number of Bad Bunny songs I listened to on that day. -
weekdays_hours.csv
: the data I used to train the anomaly detection model. It has two features,hour
, the hour (in 24 hours) when I listened to a song, andweekday
, the day of the week. I'm using one-hot encoding in the training script to transform theweekday
column into seven columns (one per day).
-
-
notebooks/
: the notebooks I wrote during the analysts.-
analysis.Rmd
: the data analysis (written in R). It has the code that summarizes the dataset, explores it, and visualizes it. -
analysis_ES.Rmd
: a copy of the previous script with the visualizations translated to Spanish. -
isolation_forest.ipynb
: the script that trains the isolation forest model. -
timeseries.ipynb
: the script that trains the time series model.
-
-
plots/
: all the plots I created as part of the analysis. There's an English and Spanish (these files are suffixed with "_ES") version of each chart. -
service/
: an incomplete skeleton code of a service I want to build with FastAPI to serve the model's predictions. Here you will find an exported copy (model.joblib
) of the isolation model I trained.
- scikit-learn: to train the isolation forest model.
- Prophet: to train the time series model.
- ggplot2: to create the visualizations in R.
- skimr: An R package that provide summary statistics.
- tidyr and dplyr: to manipulate my data.
I presented this work (in Spanish) at PyCon Latam 2022. For details, see: https://pylatam.org/#schedule.