GitBucket
4.21.2
Toggle navigation
Snippets
Sign in
Files
Branches
2
Releases
Issues
3
Pull requests
Labels
Priorities
Milestones
Wiki
Forks
nigel.stanger
/
docker-analytics
Browse code
Revamped in line with Mark’s approach
• switched to direct install of graphframes (closes
#1
)
master
spark3
1 parent
2868c44
commit
ba71b8b37456c752dba7a3ef5a969e27b915ea11
Nigel Stanger
authored
on 20 Sep 2021
Patch
Showing
1 changed file
spark/Dockerfile
Ignore Space
Show notes
View
spark/Dockerfile
FROM python:3.6-alpine ENV SPARK_VERSION="2.4.8" \ HADOOP_VERSION="2.7" \ GRAPHFRAMES_VERSION="0.8.1-spark2.4-s_2.11" \ APACHE_MIRROR="https://dlcdn.apache.org" \ SPARK_INSTALL="/usr/local" RUN apk add --no-cache \ bash \ openjdk8-jre \ tini RUN apk add --no-cache --virtual .fetch-deps \ wget \ tar # download, install, and symlink spark RUN cd $SPARK_INSTALL && \ wget -q --show-progress --progress=bar:force:noscroll $APACHE_MIRROR/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION.tgz 2>&1 && \ tar xzf spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION.tgz && \ ln -s spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION spark && \ rm -f spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION.tgz # download and install graphframes RUN cd $SPARK_INSTALL/spark/jars && \ wget -q --show-progress --progress=bar:force:noscroll https://repos.spark-packages.org/graphframes/graphframes/$GRAPHFRAMES_VERSION/graphframes-$GRAPHFRAMES_VERSION.jar RUN apk del .fetch-deps && \ rm -rf /tmp/* && \ rm -rf /var/cache/* && \ rm -rf /root/.cache COPY start-master.sh start-worker.sh /usr/local/bin/ # these need to be separate because you can't reference prior environment # variables in the same ENV block ENV SPARK_HOME="$SPARK_INSTALL/spark" \ SPARK_HOSTNAME="localhost" \ SPARK_MASTER_PORT="7077" \ SPARK_MASTER_WEBUI_PORT="8080" COPY spark-defaults.conf $SPARK_HOME/conf ENV SPARK_MASTER="spark://$SPARK_HOSTNAME:$SPARK_MASTER_PORT" # Spark doesn't seem to respond directly to SIGTERM as the exit status is # for SIGKILL (137), after a pause. Presumably docker-compose down times out. # Using tini gives immediate exit with status 143 (SIGTERM). ENTRYPOINT ["/sbin/tini", "--"] CMD ["/usr/local/bin/start-master.sh"]
FROM python:3.6-alpine ENV SPARK_VERSION="2.4.3" \ HADOOP_VERSION="2.7" \ SPARK_INSTALL="/usr/local" RUN apk add --no-cache \ bash \ openjdk8 \ tini \ zeromq RUN apk add --no-cache --virtual .fetch-deps \ curl \ tar RUN curl -s https://www-us.apache.org/dist/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION.tgz | tar -xz -C $SPARK_INSTALL && \ cd $SPARK_INSTALL && ln -s spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION spark RUN apk del .fetch-deps COPY start-master.sh start-worker.sh /usr/local/bin/ # these need to be separate because you can't reference prior environment # variables in the same ENV block ENV SPARK_HOME="$SPARK_INSTALL/spark" \ SPARK_HOSTNAME="localhost" \ SPARK_MASTER_PORT="7077" \ SPARK_MASTER_WEBUI_PORT="8080" COPY spark-defaults.conf $SPARK_HOME/conf ENV SPARK_MASTER="spark://$SPARK_HOSTNAME:$SPARK_MASTER_PORT" # Spark doesn't seem to respond directly to SIGTERM as the exit status is # for SIGKILL (137), after a pause. Presumably docker-compose down times out. # Using tini gives immediate exit with status 143 (SIGTERM). ENTRYPOINT ["/sbin/tini", "--"] CMD ["/usr/local/bin/start-master.sh"]
Show line notes below