docker-analytics / examples /
# Examples

## “Hello world”

This is a basic “hello world” demonstration of Apache Structured Streaming using an Apache Kafka data source.  This notebook subscribes to that topic and displays the results of the query.

The script `` repeatedly sends the string “Hello *n*” to the `sample` topic in Kafka, where *n* is an incrementing sequence number.

The Jupyter notebook `sample-consumer.ipynb` subscribes to the `sample` topic and displays the results of the query.

## Clickstream

This is a clickstream processing demo using Apache Kafka and Spark Structured Streaming, based on the original Scala version described at [Clickstream Analysis using Apache Spark and Apache Kafka]( (IBM).

The clickstream data is from the [Wikipedia Clickstream]( project, and is streamed line-by-line by the script `` into the `clickstream` topic in Kafka. Each line comprises four tab-separated values: the previous page visited (`prev`), the current page (`curr`), the type of page (`type`), and the number of clicks for that navigation path (`n`). The output is a rank-ordered list of Wikipedia pages with the most hits.

The example uses the November 2017 dump (`2017_01_en_clickstream.tsv.gz`) from the original Wikipedia Clickstream data set ([doi:10.6084/m9.figshare.1305770](,but should work with later dumps available from <>.