GitXplorerGitXplorer
l

fhwedel-ansible

public
2 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
73899313697f7996ac886c7bd96cbae31851c6c7

Adds a simple README file

llfrancke committed 6 years ago
Unverified
383a0217ffdbaa7e0f8135ff7f22c60a34864288

spark fix

llfrancke committed 7 years ago
Unverified
702e5696f423449ded5c29d84062d3755efcdc4c

Fix Hive & MR & scripts

llfrancke committed 7 years ago
Unverified
a7e2bf52c36e285ab2d384dea3d17d1b15a6ec29

Demo data & init scripts

llfrancke committed 7 years ago
Unverified
0c371146833cbcfabd145e3f169a76fee3cb07e4

Fix another issue

llfrancke committed 7 years ago
Unverified
43ec911b5e82f6f376397eac1972326c8073f6b6

Fix var issue

llfrancke committed 7 years ago

README

The README file for this repository.

= Setup

== For real

Run the Ansible playbook ansible/site.yml like this:

ansible-playbook -f 30 -i ansible/site.yml

When that's done you need to run the following commands once:

.Initial setup [source,bash]

sudo su hadoop source /opt/hadoop/setup_hadoop.sh hdfs namenode -format

start-dfs.sh

hadoop fs -mkdir -p /user/spark/applicationHistory hadoop fs -chmod 1777 /user/spark/applicationHistory

hadoop fs -mkdir -p /tmp hadoop fs -chmod 1777 /tmp

hadoop fs -mkdir /data hadoop fs -put * /data

hadoop fs -mkdir -p /user/history hadoop fs -chmod -R 1777 /user/history

schematool -initSchema -dbType derby hadoop fs -mkdir /user/hive/warehouse

stop-dfs.sh

== Vagrant

Running a simple vagrant up will create a virtual machine and run the necessary steps (including Ansible) to get started. vagrant ssh can then be used to access the machine.

[source,bash]

sudo su hadoop source /opt/hadoop/setup_hadoop.sh hdfs namenode -format

hadoop-daemon.sh start namenode hadoop-daemon.sh start datanode

hdfs dfs -mkdir -p /user/spark/applicationHistory hadoop fs -chmod 1777 /user/spark/applicationHistory

hadoop fs -mkdir /tmp hadoop fs -chmod 1777 /tmp

hadoop fs -mkdir /data

hadoop fs -put * /data

schematool -initSchema -dbType derby hadoop fs -mkdir -p /user/hive/warehouse

hadoop-daemon.sh stop datanode hadoop-daemon.sh stop namenode

= Usage

To get the necessary environment variables set up, you must execute the following command:

source /opt/hadoop/setup_hadoop.sh

== Cluster mode

When the initial setup is done you need to run these commands to get the cluster up and running:

.Starting the cluster [source,bash]

sudo su hadoop source /opt/hadoop/setup_hadoop.sh start-dfs.sh start-yarn.sh mr-jobhistory-daemon.sh start historyserver start-history.server.sh hive --service metastore &> /opt/hadoop/logs/metastore.log &

.Stopping the cluster [source,bash]

TODO

== Vagrant

.Starting the components [source,bash]

yarn-daemon.sh start resourcemanager yarn-daemon.sh start nodemanager mr-jobhistory-daemon.sh start historyserver start-history-server.sh hive --service metastore &> /opt/hadoop/logs/metastore.log &

.Stopping the components [source,bash]

TODO

== Example commands to run

.Spark PI spark-submit --class org.apache.spark.examples.SparkPi /opt/hadoop/spark/examples/jars/spark-examples_2.11-2.2.1.jar 10

.Hadoop MR PI hadoop jar /opt/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar pi 100 100

.Spark Java WordCount spark-submit --class org.apache.spark.examples.JavaWordCount /opt/hadoop/spark/examples/jars/spark-examples_2.11-2.2.1.jar