GitXplorerGitXplorer
r

bicimad-data-analysis

public
15 stars
2 forks
0 issues

Commits

List of commits on branch master.
Unverified
9cfe821f1b5915478654ab840add67f67e5fa221

Cleanup

rrameerez committed 7 years ago
Unverified
9ff9a937e504c6e6459fd578e72b044273a64ebd

Typo :lipstick:

rrameerez committed 7 years ago
Unverified
d4d4f14feccb1cc5cda7d35fbebe1d88a1b9c26f

Cleanup :lipstick:

rrameerez committed 7 years ago
Unverified
a8b6e352ca3902f13056c9d5ae87c30aa72a87ba

Neural net now predicting :sparkles:

rrameerez committed 7 years ago
Unverified
944c629a9e1b8b38d2b5c733a53fdc6f2b2997fd

Neural net getting trained!

rrameerez committed 7 years ago
Unverified
9e1b5b1ad47b1b564ccc561225c849d103b1413f

Playing with meteo data

rrameerez committed 7 years ago

README

The README file for this repository.

BiciMAD Data Analysis

Madrid's public bike system data analysis.

Javi Ramírez @rameerez

Introduction

BiciMad is the public bike system in Madrid, Spain. I'm a big fan of commuting by bike in Madrid (even though car drivers are still complete assholes to bike drivers, and the city is not fully conditioned to bike traffic).

I tend to use my own bike, but still I find myself more than often riding BiciMad bikes (it's pretty convenient: they have an electric motor that assists in pedaling, and you can just take and drop them in the nearest station without having to worry about getting your own bike stolen). Still, every time I've used them, I've detected a number of issues (broken bikes, out-of-order plugs, empty and completely full stations...)

In April 2017 I contacted EMT Madrid (the public company that now runs BiciMad) and asked their OpenData department for BiciMad data to analyze. They inmediatly answered and provided me with a huge dataset and helpful documentation. I want to thank EMT's OpenData for their kindness and contribution.

My goal with this data analysis is to discover hidden patterns that can reveal underlying problems, to provide BiciMad with powerful data-based suggestions that can help improve the service for all us Madrid citizens.

Data source: EMT OpenData

Data bias / restrictions

Note that BiciMAD is filtering rides longer than 6 hours. This prevents us from analyzing the stolen / lost / missing bikes behavior.

BiciMAD does not provide either unique IDs for bikes, thus we can't identify single bikes and therefore we can't analyze bike failure rates and so on.

Installation

Using Python 3.5.3.

Please make sure you have the following libraries installed: ipython, jupyter, pandas, numpy and bokeh.

If not, install them with either anaconda or pip:

pip install -U ipython jupyter pandas numpy bokeh

conda install ipython jupyter pandas numpy bokeh

Then just launch the Jupyter notebook with:

ipython notebook

Instructions

Uncompress the .rar file under /data and place the two .json files in /data.

If you would like to use any other BiciMAD dataset, there is a global variable in each notebook to configure the datasets to be loaded.

Markdown tables for describing the datasets within the notebooks were created using TablesGenerator. They allow you to save/load tables in tgn format, those reside in the /doc folder.

To-do (& ideas & hipotheses to test)

  • [ ] A lot of to-dos within the notebooks
  • [ ] Maybe people under 25 are not using BiciMAD because of the 20€/month Metro+Train+Buses card (Abono Joven <25) – can we test that?
  • [ ] Maybe it's difficult for older people to use BiciMAD (to set up an account and so)
  • [ ] There might be stations high a higher ratio of defective bikes than others
  • [ ] Are there clusters of users? (Maybe people using the bikes for commuting and others for pleasure / sightseeing)
  • [ ] Heatmap of GPS routes - most active GPS points
  • [ ] Are there demand peaks because of events going on in the city? (maybe sports events, there might be shortage of bikes in the Bernabeu area before and after a Real Madrid match)
  • [ ] Can we model & predict demand peaks? (weather, events going on in the city...)

Made with ♥ from Madrid