For this project, I have considered Movie lens dataset from Group lens website.
Movie lens dataset link: https://grouplens.org/datasets/movielens/
Steps to execute:
- Download the files from the github repository.
- Get the movies.csv, links.csv, ratings,csv and tags.csv files from their respective .rar files.
- Place the csv files in datasets folder and place the datasets folder in notebooks folder. The notebooks folder should also have ipynb file as well.
- Navigate to terminal and type "jupyter notebook"
- Navigate to the folder where the notebook is placed.
- From the menu icon cell, click on Run all which will run the whole notebook from the first cell. Verify the results.
The project is all about building recommendation systems to recommend the same genre movies using the other columns as features.
Steps to follow:
- Set up a data science project structure in a new git repository in your GitHub account
- Download the one of the MovieLens datasets from https://grouplens.org/datasets/movielens/
- Load the data set into panda data frames
- Formulate one or two ideas on how the combination of ratings and tags by users helps the data set to establish additional value using exploratory data analysis
- Build one or more clustering models to determine similar movies to recommend using the other ratings and tags of movies by other users as features
- Document your process and results
- Commit your notebook, source code, visualizations and other supporting files to the git repository in GitHub