For this project, I have considered House sales data of King County, Seattle and Seattle weather datasets from Kaggle.
House Sales: https://www.kaggle.com/harlfoxem/housesalesprediction
Seattle Weather : https://www.kaggle.com/rtatman/did-it-rain-in-seattle-19482017
Steps to execute:
- Download the files from the github repository.
- Unzip the Data_Science_Project1.zip file to get the kc_house_data.csv and Seattle weather data file.
- Place the csv files in data folder and place the data folder in notebooks folder. The notebooks folder should also have ipynb file as well.
- Navigate to terminal and type "jupyter notebook"
- Navigate to the folder where the notebook is placed.
- From the menu icon cell, click on Run all which will run the whole notebook from the first cell. Verify the results.
This project depicts how weather impacts the house sales in Seattle.
Travelling the world on a mission to discover new data
- Set up a data science project structure in a new git repository in your GitHub account
- Install Jupyter notebook prerequisites (Anaconda, Python, etc.)
- Select an industry
- Select two to three public data sets from that industry
- Load the data sets into panda data frames following the 10 minutes to pandas guide
- Formulate one or two ideas on how the data sets could be combined to establish additional value using exploratory data analysis
- Transform the data sets into a single data set while following data preparation processes to clean and transform features (use pandas documentation for help)
- Document your process and results
- Commit your notebook, source code, visualizations and other supporting files to the git repository in GitHub