Instructions and Ansible playbooks for deploying a Spark stand-alone cluster on OpenStack, cloned and adapted from https://github.com/johandahlberg/ansible_spark_openstack
|templates||8 years ago|
|vars||8 years ago|
|README.md||7 years ago|
|ansible.cfg||8 years ago|
|create_spark_cloud_playbook.yml||8 years ago|
|deploy_spark_playbook.yml||8 years ago|
|localhost_inventory||8 years ago|
|openstack_inventory.py||8 years ago|
It will install spark and hdfs, and start the required services on the nodes. Please note that this is a proof-of-concept implementation, and that is is not ready for use in a production setting. Any pull requests to improve upon this to bring it closer to a production ready state are very much appreciated.
The open stack dymamic inventory code presented here is adapted from: https://github.com/lukaspustina/dynamic-inventory-for-ansible-with-openstack
sshin to it.
sshto the machine you just created.
sudo apt-get install python-pip python-dev git sudo pip install ansible sudo pip install python-novaclient
git clone https://github.com/johandahlberg/ansible_spark_openstack.git
filesin the repo root dir and copy you ssh-keys (these cannot have a password) there. This is used to enable password-less ssh access between the nodes:
source <path to rc file>, and fill in your OpenStack password. This will load information about you OpenStack Setup into your environment.
nova secgroup-create spark "internal security group for spark" nova secgroup-add-group-rule spark spark tcp 1 65535
Setup the name of your network.
export OS_NETWORK_NAME="<name of your network>" If you like you can add this to your OpenStack RC file, or set it in your
bash_rc. (You can find the name of your network in your OpenStack dashboard)
Edit the setup variables to fit your setup. Open
vars/main.yml and setup the variables as explained there.
ansible-playbook -i localhost_inventory --private-key=<your_ssh_key> create_spark_cloud_playbook.yml
ansible-playbook -i openstack_inventory.py --private-key=<your_ssh_key> deploy_spark_playbook.yml
sshinto the spark-master node and try your new Spark cluster it by kicking of a shell. Now you're ready to enter into the Spark world. Have fun!
spark-shell --master spark://spark-master:7077 --executor-memory 6G
If you don't want to open the web-facing ports you can use ssh-forwarding to reach the web-interfaces, e.g
ssh -L 8080:spark-master:8080 -i <your key> ubuntu@<spark-master-ip>