GitXplorerGitXplorer
U

fall19-big_data_lu

public
0 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
a625aa787ceb70f7b061c8182f8df2f09f275146

Resolved the conflicts using "mine"

oolivergrasl committed 5 years ago
Unverified
c5374b138f142cead15d267ce2d5d59a39bb1d87

finalised the exercises

oolivergrasl committed 5 years ago
Unverified
bb60b80412b389ed92ae77150e9c6db46f1acd54

updated the dashboard for batch indexes

oolivergrasl committed 5 years ago
Unverified
955dd0b950fa7fc993cc73489c6a338b1cac1e28

mixed view now complete

oolivergrasl committed 5 years ago
Unverified
755e889765aa6c26ab03b96c06ef64c82b6e53c0

updated the mixed view

oolivergrasl committed 5 years ago
Unverified
a6ed7ea343bf765e35d5e210184690b62f1b102f

initial version of mixed view

oolivergrasl committed 5 years ago

README

The README file for this repository.

Showcase Big Data

  1. Install docker (see (docker installation instructions)[https://docs.docker.com/install/])

  2. Run docker-compose: docker-compose up

  3. In JupyterLab: Open the notebook notebooks/showcase_big_data.ipynb

  4. Start simulation: run the notebook and then hit the "Start" button

  5. Access Kibana and go to the Discover page

  6. Setup the index pattern *_car, using timestamp for the time dimension

  7. On the "Visualize" page: Create a line visualisation that shows revenue over time (you need to set up a bucket for the X-axis/Histogram/time with interval 1) and save it.

  8. Create another line visualisation with total revenue / time

  9. Create a heatmap visualisataion with a 20x20 grid for the taxi positions (pos_x, pos_y)

  10. Create a dashboard that shows the visualisations (last 15min). Set it to refresh automatically every 10s. Save the dashboard.

  11. Go to the management page and chose export. This create an export.ndjson file which you import next time you start the showcase, via the import button.

    HINT: you can find predefined dashboards in kibana\big_data.ndjson.

  12. Turn on Kibana stack monitoring - you should see two nodes with 9 master shards and 9 replica shards

  13. Run some experiments with elasticsearch:

    • From the stack monitoring page, go into the nodes page - this shows you how the shards are distributed over the nodes

    • Add a third ES node: docker run -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" -e "node.name=es_node3" -e "node.master=false" -e "node.data=true" -e "discovery.seed_hosts=es_node1,es_node2" -e "cluster.name=showcase_cluster" --network showcase_big_data_bdn --name es_node3 -p 9203:9200 elasticsearch:7.4.1

    • Stop that node: docker stop es_node3 and see what happens to the replica shards (you can now remove the container with docker rm es_node3)

    • Reset all containers (docker-compose down followed by docker-compose up) and now start the simulation with 10 cars (change the config in JupyterLab /scenarios/abm.json before starting the simulation: CARMODEL.scenarios.scenario.agents[car].count:10). See how many shards there are now.

  14. Reset all containers and then restart start the simulation in the big_data notebook.

  15. In Eleasticsearch, import the saved objects from kibana/big_data.ndjson

  16. In Jupyter, work through the exercise in notebooks/exercise_batch_processing.ipynb

  17. Check the Kibana dashboard - you shoud see how the average revenue per timestep develops

  18. In Jupyter, work through the exercise in notebooks/exercise_