Instructions and Ansible playbooks for deploying a Spark stand-alone cluster on OpenStack, cloned and adapted from https://github.com/johandahlberg/ansible_spark_openstack
templates | 9 years ago | ||
vars | 9 years ago | ||
README.md | 9 years ago | ||
ansible.cfg | 9 years ago | ||
create_spark_cloud_playbook.yml | 9 years ago | ||
deploy_spark_playbook.yml | 9 years ago | ||
localhost_inventory | 9 years ago | ||
openstack_inventory.py | 9 years ago |
This describes how start a standalone Spark cluster on open stack, using two ansible playbooks. This has been tested on the Uppmax private cloud smog.
It will install spark and hdfs, and start the required services on the nodes. Please note that this is a proof-of-concept implementation, and that is is not ready for use in a production setting. Any pull requests to improve upon this to bring it closer to a production ready state are very much appreciated.
The open stack dymamic inventory code presented here is adapted from: https://github.com/lukaspustina/dynamic-inventory-for-ansible-with-openstack
ssh
in to it.ssh
to the machine you just created.sudo apt-get install python-pip python-dev git sudo pip install ansible sudo pip install python-novaclient
git clone https://github.com/johandahlberg/ansible_spark_openstack.git
files
in the repo root dir and copy you ssh-keys (these cannot have a password) there. This is used to enable password-less ssh access between the nodes:source <path to rc file>
, and fill in your OpenStack password. This will load information about you OpenStack Setup into your environment.nova secgroup-create spark "internal security group for spark" nova secgroup-add-group-rule spark spark tcp 1 65535
Setup the name of your network. export OS_NETWORK_NAME="<name of your network>"
If you like you can add this to your OpenStack RC file, or set it in your bash_rc
. (You can find the name of your network in your OpenStack dashboard)
Edit the setup variables to fit your setup. Open vars/main.yml
and setup the variables as explained there.
ansible-playbook -i localhost_inventory --private-key=<your_ssh_key> create_spark_cloud_playbook.yml
ansible-playbook -i openstack_inventory.py --private-key=<your_ssh_key> deploy_spark_playbook.yml
ssh
into the spark-master node and try your new Spark cluster it by kicking of a shell. Now you're ready to enter into the Spark world. Have fun!spark-shell --master spark://spark-master:7077 --executor-memory 6G
If you don't want to open the web-facing ports you can use ssh-forwarding to reach the web-interfaces, e.g
ssh -L 8080:spark-master:8080 -i <your key> ubuntu@<spark-master-ip>