diff --git a/Map_Visualisation.tex b/Map_Visualisation.tex
index 588c285..19489cd 100755
--- a/Map_Visualisation.tex
+++ b/Map_Visualisation.tex
@@ -2,7 +2,6 @@
\usepackage{graphicx}
-% \usepackage{subfig}
\newtheorem{theorem}{Theorem}[section]
@@ -22,15 +21,18 @@
\author{NIGEL STANGER \\ University of Otago}
\begin{abstract}
-A common technique for visualising the geographical distribution of web
+A useful approach to visualising the geographical distribution of web
site hits is to geolocate the IP addresses of hits and plot them on a
-world map. This is commonly achieved by dynamic generation of images on
-the server. In this paper we compare the scalability of this technique
-with three others: overlaying transparent images on an underlying base
-map, overlaying CSS-enabled HTML on an underlying base map and
-generating a map using Google Maps. The results show that all four
-techniques are suitable for small data sets, but that the latter two
-techniques scale poorly to large data sets.
+world map. This can be achieved by dynamic generation and display of map
+images at the server and/or the client. In this paper we compare the
+scalability with respect to source data size of four techniques for
+dynamic map generation and display: generating a single composite map
+image, overlaying transparent images on an underlying base map,
+overlaying CSS-enabled HTML on an underlying base map and generating a
+map using Google Maps. These four techniques embody a mixture of
+different display technologies and distribution styles. The results show
+that all four techniques are suitable for small data sets, but that the
+latter two techniques scale poorly to larger data sets.
\end{abstract}
\category{C.4}{Performance of Systems}{Performance attributes}
@@ -75,16 +77,17 @@
Otago\footnote{\url{http://eprints.otago.ac.nz/}} in November 2005
\cite{Stan-N-2006-running}, using the GNU
EPrints\footnote{\url{http://www.eprints.org/}} repository management
-software. The repository quickly attracted interest from around the
+software. This repository quickly attracted interest from around the
world and the number of abstract views and document downloads began to
steadily increase. We were obviously very interested in tracking this
increase, particularly with respect to where in the world the hits were
-coming from. The EPrints statistics software developed at the University
-of Tasmania \cite{Sale-A-2006-stats} proved very useful in this regard,
-providing us with detailed per-eprint and per-country download
-statistics; an example of the latter is shown in
+coming from. The EPrints statistics management software developed at the
+University of Tasmania \cite{Sale-A-2006-stats} proved very useful in
+this regard, providing us with detailed per-eprint and per-country
+download statistics; an example of the latter is shown in
Figure~\ref{fig-tas-stats}. However, while this display provides an
ordered ranking of the number of hits from each country, it does not
+provide any greater detail than to the country level, nor does it
provide any visual clues as to the distribution of hit sources around
the globe.
@@ -99,41 +102,40 @@
\end{figure}
-We therefore began to explore various techniques for plotting our
+We therefore began to explore possible techniques for plotting our
repository hit data onto a world map, with the aim of adding this
capability to the Tasmania statistics package. Our preference was for a
technique that could be used within a modern web browser without the
-need to manually install additional client software, thus providing us
-with the widest possible audience and reducing the impact of wide
-variation in client hardware and software environments \cite[pp.\
-27--28]{Offu-J-2002-quality}.
+need to manually install additional client software, so as to make the
+new feature available to the widest possible audience and reduce the
+impact of wide variation in client hardware and software environments
+\cite[pp.\ 27--28]{Offu-J-2002-quality}.
-There have been several prior efforts to plot web activity
-geographically. \citeN{Lamm-SE-1996-webvis} developed a sophisticated
-system for real-time visualization of web traffic on a 3D globe, but
-this was intended for use within a virtual reality environment, thus
-limiting its general applicability. \citeN{Papa-N-1998-Palantir}
-described a similar system (Palantir) written in Java, which was thus
-able to be run within a web browser, assuming that a Java virtual
-machine was available. \citeN[pp.\ 100--103]{Dodg-M-2001-cybermap}
-describe these and several other related systems for mapping Web and
-Internet traffic.
+There have been several prior efforts to geovisualize web activity.
+\citeN{Lamm-SE-1996-webvis} developed a sophisticated system for
+real-time visualization of web traffic on a 3D globe, but this was
+intended for use within a virtual reality environment, thus limiting its
+general applicability. \citeN{Papa-N-1998-Palantir} described a similar
+system (Palantir), which was written as a Java applet and thus able to
+be run within a web browser, assuming that a Java virtual machine was
+available. \citeN[pp.\ 100--103]{Dodg-M-2001-cybermap} describe these
+and several other related systems for mapping Web and Internet traffic.
These early systems suffered from a distinct limitation in that there
was no public infrastructure in place for geolocating IP addresses (that
is, translating them into latitude/longitude coordinates). They
generally used \texttt{whois} lookups or parsed the domain name in an
-attempt to guess the country of origin, but these produced fairly crude
-results. Locations outside the United States were typically aggregated
-by country and mapped to the capital city
+attempt to guess the country of origin, with fairly crude results
+\cite{Lamm-SE-1996-webvis}. Locations outside the United States were
+typically aggregated by country and mapped to the capital city
\cite{Lamm-SE-1996-webvis,Papa-N-1998-Palantir,Jian-B-2000-cybermap}.
-Reasonably accurate databases were commercially available at the time
-\cite[p.\ 1466]{Lamm-SE-1996-webvis}, but were not available to the
-public at large, thus limiting their utility.
+Reasonably accurate and detailed databases were commercially available
+at the time \cite[p.\ 1466]{Lamm-SE-1996-webvis}, but were not generally
+available to the public at large, thus limiting their utility.
The situation has improved considerably in the last five years, however,
with the advent of freely available and reasonably accurate geolocation
-databases\footnote{Such as \url{http://www.maxmind.com/} or
+services\footnote{Such as \url{http://www.maxmind.com/} or
\url{http://www.ip2location.com/}.} with worldwide coverage and
city-level resolution. For example, Maxmind's \emph{GeoLite City}
database is freely available and claims to provide ``60\% accuracy on a
@@ -141,16 +143,16 @@
\cite{Maxm-G-2006-GeoLiteCity}. Their commercial \emph{GeoIP City}
database claims 80\% accuracy for the same parameters.
-The techniques used by previous systems can generally be divided into two
-classes. Techniques of the first class generate a single bitmap image that
-contains both the map and the icons representing web hits. This can be
-achieved by programmatically plotting points onto a base map image,
-which is then displayed at the client. We shall henceforth refer to this
-class of techniques as \emph{image generation} techniques. Techniques of the
-second class separately return both a base map image and some kind of
-overlay containing the plotted points. The overlay is then combined with
-the base map at the client. We shall henceforth refer to this class of
-techniques as \emph{overlay} techniques.
+The techniques used by these systems can generally be divided into two
+classes. The first class of techniques generate a single bitmap image
+that contains both the map and the icons representing web hits. This can
+be achieved by programmatically plotting points onto a base map image;
+the composite image is displayed at the client. We shall henceforth
+refer to this class of techniques as \emph{image generation} techniques.
+The second class of techniques separately return both a base map image
+and some kind of overlay containing the plotted points. The overlay is
+then combined with the base map at the client. We shall henceforth refer
+to this class of techniques as \emph{overlay} techniques.
Both classes of techniques have been used in the aforementioned systems,
but the overlay technique appears to have been particularly popular. For
@@ -160,9 +162,9 @@
\cite{Papa-N-1998-Palantir}. A more recent example is the Google Maps
API \cite{Goog-M-2006-maps}, which enables web developers to easily
embed dynamic, interactive maps within web pages. Google Maps is a
-dynamic overlay technique that has only recently become feasible with
-the advent of support for CSS positioning and Ajax technologies in most
-browsers.
+dynamic overlay technique that has only become feasible relatively
+recently with the advent of widespread support for CSS positioning and
+Ajax technologies in many browsers.
Overlay techniques enjoy a particular advantage over image generation
techniques, in that they provide the potential for a more flexible GIS-like
@@ -174,22 +176,23 @@
should therefore be portable to a wider range of client and server
environments.
-Within each class, there are several alternative approaches to
-implementing these techniques. For example, an image generation technique
-might be completely server-side or use a mixture of server-side and
-client-side processing. Overlay techniques can also adopt different
-distribution styles, and the overlays themselves might take the form of
-transparent images, absolutely positioned HTML elements, dynamically
-generated graphics, etc.
+Each technique comprises a specific technology or collection of
+technologies (such as transparent bitmap overlays), implemented using a
+specific distribution style. For example, one image generation technique
+might be implemented completely server-side while another might use a
+mixture of server-side and client-side processing. Similarly, overlay
+techniques may adopt different distribution styles, and the overlays
+themselves might take the form of transparent images, absolutely
+positioned HTML elements, dynamically generated graphics, etc.
-Given the many possible techniques that were available, the next question
-was which of these techniques would be most suitable for our purposes?
+Given the many possible techniques that were available, the next
+question was which techniques would be most suitable for our purposes?
Scalability is a key issue for web applications in general \cite[p.\
28]{Offu-J-2002-quality}, and online activity visualization in
particular \cite[p.\ 50]{Eick-SG-2001-sitevis}, so we were particularly
-interested in techniques that could scale to a large number of points. For
-example, at the time of writing the Otago EPrints repository had been
-accessed from over 10,000 distinct IP addresses, each potentially
+interested in techniques that could scale to a large number of points.
+For example, at the time of writing the Otago EPrints repository had
+been accessed from over 10,000 distinct IP addresses, each potentially
representing a distinct geographical location. Separating out the type
of hit (abstract view versus document download) increased that figure to
nearly 13,000.
@@ -200,12 +203,12 @@
chosen are discussed in Section~\ref{sec-techniques}. We then set about
testing the scalability of these four techniques, in order to determine
how well each technique handled large numbers of points. A series of
-experiments was conducted using each technique with progressively larger
+experiments was conducted on each technique with progressively larger
data sets, and the elapsed time and memory usage were measured. The
experimental design is discussed in Section~\ref{sec-experiment}.
-Our intuition was that server-side image generation and server-side
-image overlay techniques would prove the most scalable, and this was
+Our initial intuition was that server-side image generation and
+server-side image overlay techniques would scale best, and this was
borne out by the results of the experiments, which show that both
techniques scale reasonably well to very large numbers of points. The
other two techniques proved to be reasonable for relatively small
@@ -213,6 +216,16 @@
performance deteriorated rapidly beyond this. The results are discussed
in more detail in Section~\ref{sec-results}.
+It should be noted that the intent of the experiments was not to
+identify statistically significant differences between techniques. It
+was expected that variations across techniques would be obvious, and the
+experiments were designed to test this expectation. However, the two
+best performing techniques, server-side image generation and server-side
+image overlay, produced very similar results, so a more formal
+statistical analysis of these techniques may be warranted. This and
+other possible future directions are discussed in
+Section~\ref{sec-future}.
+
\section{Technique selection}
\label{sec-techniques}
@@ -234,8 +247,8 @@
\emph{data server} style is where the server only supplies raw data, and
all manipulation, display and analysis takes place at the client. In
other words, this is primarily a client-side processing model, as
-illustrated in Figure~\ref{fig-distribution-styles}(a). For example, Palantir
-implemented an overlay technique using this distribution style
+illustrated in Figure~\ref{fig-distribution-styles}(a). For example,
+Palantir implemented an overlay technique using this distribution style
\cite{Papa-N-1998-Palantir}, where the source data were generated at the
server and the map was generated, displayed and manipulated by a Java
applet running at the client. The data server distribution style can
@@ -246,9 +259,10 @@
same cannot be said for either Java or Flash. That is, we cannot
necessarily guarantee the existence of a Java virtual machine or Flash
plugin in every browser, which violates our requirement to avoid manual
-installation of additional cient-side software. We can therefore
+installation of additional client-side software. We can therefore
eliminate Java- or Flash-based data server techniques from
-consideration.
+consideration, but JavaScript-based data server techniques may still be
+feasible.
\begin{figure}
@@ -278,36 +292,37 @@
In contrast, the \emph{image server} style is where the display is
-created at entirely the server and is only viewed at the client. In
+created entirely at the server and is only viewed at the client. In
other words, this is primarily a server-side processing model, as
-illustrated in Figure~\ref{fig-distribution-styles}(b). Consequently, techniques
-that use this style require no additional client-side software, and thus
-meet our requirements. The downside is that the resultant visualization
-can tend to be very static and non-interactive in nature, as it is a
-simple bitmap image.
+illustrated in Figure~\ref{fig-distribution-styles}(b). Consequently,
+techniques that use this style require no additional client-side
+software, and thus meet our requirements. The downside is that the
+resultant visualization can tend to be very static and non-interactive
+in nature, as it is just a simple bitmap image.
-The \emph{3D model interaction environment} style is where a model
-created at the server can be explored at the client, as illustrated in
-Figure~\ref{fig-distribution-styles}(c). The phrase ``3D model
-interaction'' seems slightly out of place in the current context.
-\citeN{Wood-J-1996-vis} originally intended this distribution style to
-apply to VRML models for GIS applications, but it could be equally
-applied to any situation where an interactive model is generated at the
-server, then downloaded to and manipulated at the client. This is very
-similar to what happens with many Flash-based applications, for example.
-A more general name for this style could therefore be \emph{model
-interaction environment}. The key distinguishing feature of this style
-is that there is no further interaction between the client and server
-after the model has been downloaded. This means that while the
-downloaded model can be very dynamic and interactive, changing the
-underlying data requires a new model to be generated and downloaded from
-the server. Similar restrictions apply to techniques using this style as
-with the data server style, so Java- and Flash-based model interaction
+The \emph{model interaction environment} style is where a model created
+at the server can be explored at the client, as illustrated in
+Figure~\ref{fig-distribution-styles}(c). \citeN{Wood-J-1996-vis}
+originally referred to this as the ``3D model interaction'' style, but
+this seems slightly out of place in the current context. They originally
+intended this distribution style to apply to VRML models for GIS
+applications, but it could be equally applied to any situation where an
+interactive model is generated at the server, then downloaded to and
+manipulated at the client. This is very similar to what happens with
+many Flash-based applications, for example. ``Model interaction
+environment'' therefore seems a more appropriate name for this style.
+The key distinguishing feature of this style is that there is no further
+interaction between the client and server after the model has been
+downloaded. This means that while the downloaded model can be very
+dynamic and interactive, changing the underlying data requires a new
+model to be generated at the server and downloaded to the client.
+Similar restrictions apply to techniques using this style as to the
+data server style, so Java- and Flash-based model interaction
environment techniques can be eliminated from consideration. For similar
reasons, we can also eliminate solutions that require browser plugins
such as VRML or SVG (although native support for the latter is beginning
to appear in some browsers). It may be possible to implement this
-distribution style using only client-side JavaScript, but it is presentl
+distribution style using only client-side JavaScript, but it is presently
unclear as to how effective this might be.
% future work: implement model interaction using JavaScript?
@@ -319,8 +334,8 @@
interaction environment style, but with the addition of a feedback loop
from the client to the server, thus enabling a more flexible and dynamic
interaction. This is essentially the distribution style provided by Ajax
-technologies [REF]. We can eliminate techniques based on the same criteria
-as applied to the other three styles.
+technologies [REF]. We can eliminate techniques based on the same
+criteria as applied to the other three styles.
\subsection{Image generation techniques}
@@ -361,8 +376,9 @@
well-established technologies. It is also bandwidth efficient: the size
of the generated map image is determined by the total number of pixels
and the compression method used, rather than by the number of points to
-be plotted. The amount of data generated should therefore remain more or
-less constant, regardless of the number of points plotted.
+be plotted. The amount of data to be sent to the client should therefore
+remain more or less constant, regardless of the number of points
+plotted.
This technique also has some disadvantages, however. First, a suitable
base map image must be acquired. This could be generated from a GIS, but
@@ -370,9 +386,9 @@
third party. Care must be taken in the latter case to avoid potential
copyright issues. Second, the compression method used to produce the
final composite map image can have a significant impact on visual
-quality. For example, lossy compression techniques such as JPEG can make
+quality. For example, lossy compression methods such as JPEG can make
the points plotted on the map appear distinctly fuzzy, as shown in
-Figure~\ref{fig-image-quality}. A lossless compression technique such as
+Figure~\ref{fig-image-quality}. A lossless compression method such as
PNG will avoid this problem, but will tend to produce larger image
files. Finally, it is harder to provide interactive map manipulation
features with this technique, as the output is a simple static image.
@@ -407,12 +423,12 @@
Overlay techniques also involve plotting points onto a base map image,
but they differ from image generation techniques in that the points are
-not plotted directly onto the base map image. Rather, the points are
-plotted as an independent overlay on the base map image. This provides a
-significant advantage over image generation techniques, as it enables
-the possibility of multiple independent overlays that can be
+not composited directly onto the base map image. Rather, the points are
+displayed as an independent overlay on top of the base map image. This
+provides a significant advantage over image generation techniques, as it
+enables the possibility of multiple independent overlays that can be
individually shown or hidden. This is very similar to the multi-layer
-functionality provide by GIS, and is an effective way to provide
+functionality provide by GIS, and is an effective way to provided
interactive visualizations of geographic data
\cite{Wood-J-1996-vis,MacE-AM-1998-GIS}. We still have the problem of
finding a suitable base map image, however.
@@ -426,26 +442,31 @@
\emph{HTML overlay}.
An image overlay comprises a transparent bitmap image into which the
-points are plotted, which is then overlaid on the base map image (the
-output looks essentially identical to that shown in
+points are plotted, which is then overlaid on the base map image (in our
+implementation, the output looks essentially identical to that shown in
Figure~\ref{fig-image}). This requires the overlay image to be in either
PNG or GIF format, as JPEG does not support transparency. Fortunately
-the overlay image is likely to be quite small, so use of a lossless
-compression method should not be an issue. This also eliminates the
-``fuzziness'' issue noted earlier (see Figure~\ref{fig-image-quality}).
+the overlay image is likely to contain a lot of ``white space'', which
+compresses very well, so use of a lossless compression method should not
+be an issue. This also eliminates the ``fuzziness'' issue noted earlier
+(see Figure~\ref{fig-image-quality}). The size of the image overlay will
+generally be proportional to the number of points to be plotted, but the
+image compression should have a moderating effect on this.
+
As noted earlier, generating images at the client would require
additional software to be installed, so we will only consider the data
-server distribution style for this technique. That is, both the base map
+server distribution style for image overlays. That is, both the base map
image and the overlay(s) are generated at the server.
An HTML overlay comprises a collection of HTML elements corresponding to
the points to be plotted, which are positioned over the base map image
using CSS absolute positioning. There is considerable flexibility as to
-which elements could be used to construct the overlay. One possibility
-is to use \verb|| elements, which is the approach adopted by Google
-Maps (see Figure~\ref{fig-google}). Another possibility is to use
-appropriately sized and colored \verb|