GitXplorerGitXplorer
U

fall19-big_data_lu

public
0 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
f427b0d4e901c00cfab8c1ee12c2234d30cd8f71

updated the batch process, which now also sends data to ES

oolivergrasl committed 5 years ago
Unverified
e47f20d9d45fe0640d506e22ded7b11f191b5b5d

removed contents of csv and models directory from version control. Small changes to big_data and mixed_view notebooks

oolivergrasl committed 5 years ago
Unverified
2edb271f59c2e0c06a6ee79135036b66e6fb0bdb

Catch empty route

committed 5 years ago
Unverified
01f145fd695bdaa4b11e4c0df6ae4976f909b296

reactivated ES Data Collector

committed 5 years ago
Unverified
5a314ab27865bb5f57c947c3eaabc9c8954f2571

Removed visualisation

committed 5 years ago
Unverified
13645d90cfabe4972ecedc0ec42b0ad5789566c0

Finished removal of visualizations

committed 5 years ago

README

The README file for this repository.

Showcase Big Data

  1. Install docker (see (docker installation instructions)[https://docs.docker.com/install/])

  2. Run docker-compose: docker-compose up

  3. In JupyterLab: Open the notebook notebooks/showcase_big_data.ipynb

  4. Start simulation: run the notebook and then hit the "Start" button

  5. Access Kibana and go to the Discover page

  6. Setup the index pattern *_car, using timestamp for the time dimension

  7. On the "Visualize" page: Create a line visualisation that shows revenue over time (you need to set up a bucket for the X-axis/Histogram/time with interval 1) and save it.

  8. Create another line visualisation with total revenue / time

  9. Create a heatmap visualisataion with a 20x20 grid for the taxi positions (pos_x, pos_y)

  10. Create a dashboard that shows the visualisations (last 15min). Set it to refresh automatically every 10s. Save the dashboard.

  11. Go to the management page and chose export. This create an export.ndjson file which you import next time you start the showcase, via the import button.

    HINT: you can find predefined dashboards in kibana\big_data.ndjson.

  12. Turn on Kibana stack monitoring - you should see two nodes with 9 master shards and 9 replica shards

  13. Run some experiments with elasticsearch:

    • From the stack monitoring page, go into the nodes page - this shows you how the shards are distributed over the nodes

    • Add a third ES node: docker run -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" -e "node.name=es_node3" -e "node.master=false" -e "node.data=true" -e "discovery.seed_hosts=es_node1,es_node2" -e "cluster.name=showcase_cluster" --network showcase_big_data_bdn --name es_node3 -p 9203:9200 elasticsearch:7.4.1

    • Stop that node: docker stop es_node3 and see what happens to the replica shards (you can now remove the container with docker rm es_node3)

    • Reset all containers (docker-compose down followed by docker-compose up) and now start the simulation with 10 cars (change the config in JupyterLab /scenarios/abm.json before starting the simulation: CARMODEL.scenarios.scenario.agents[car].count:10). See how many shards there are now.

  14. Reset all containers and then restart start the simulation in the big_data notebook.

  15. In Eleasticsearch, import the saved objects from kibana/big_data.ndjson

  16. In Jupyter, work through the exercise in notebooks/exercise_batch_processing.ipynb

  17. Check the Kibana dashboard - you shoud see how the average revenue per timestep develops

  18. In Jupyter, work through the exercise in notebooks/exercise_