- Cropped Google Maps image. - Rearranged order of lines on plots. - Finished conclusion. - Added acknowledgements. - Shrank all images to save space. - Tweaked various minor bits of wording. - Removed acmtocl document class option. - Added more keywords. - Changed "single layer" to "single-layer". - Noted that Google Maps is a JavaSript-based data server technique. - Removed Google Earth experiments and results; added mention in conclusion. - Discussed performance divergence points (+ table).

nigel.stanger / Publications

Browse code - Cropped Google Maps image. - Rearranged order of lines on plots. - Finished conclusion. - Added acknowledgements. - Shrank all images to save space. - Tweaked various minor bits of wording. - Removed acmtocl document class option. - Added more keywords. - Changed "single layer" to "single-layer". - Noted that Google Maps is a JavaSript-based data server technique. - Removed Google Earth experiments and results; added mention in conclusion. - Discussed performance divergence points (+ table). TOIT_2006
1 parent 632a453 commit 4607c24d2b95a68e11578d8d6ab5ba98594bd710 nstanger authored on 12 Aug 2006

Browse code

- Cropped Google Maps image.

- Rearranged order of lines on plots.
- Finished conclusion.
- Added acknowledgements.
- Shrank all images to save space.
- Tweaked various minor bits of wording.
- Removed acmtocl document class option.
- Added more keywords.
- Changed "single layer" to "single-layer".
- Noted that Google Maps is a JavaSript-based data server technique.
- Removed Google Earth experiments and results; added mention in conclusion.
- Discussed performance divergence points (+ table).

TOIT_2006

1 parent 632a453 commit 4607c24d2b95a68e11578d8d6ab5ba98594bd710

nstanger authored on 12 Aug 2006

Patch

Showing 3 changed files

Ignore Space Show notes View GoogleMap-full.png

                Ignore Space
               Show notes
              View
            
          
          735 ■■■■■
          Map_Visualisation.tex
             \documentclass[acmtocl,acmnow]{acmtrans2m}
\documentclass[acmnow]{acmtrans2m}
 
 
\usepackage{graphicx}
 
\category{H.3.5}{Information Storage and Retrieval}{Online Information Services}[web-based services]
            
\terms{Experimentation, Measurement, Performance} 
            
\keywords{geolocation, geovisualization, scalability, GD, Google Maps}
\keywords{downloads, geolocation, geovisualization, scalability, Google
	Maps, distribution style, dynamic map generation}
            
\begin{document}
 
 
\cite{Stan-N-2006-running}, using the GNU
EPrints\footnote{\url{http://www.eprints.org/}} repository management
software. This repository quickly attracted interest from around the
world and the number of abstract views and document downloads began to
steadily increase. We were obviously very interested in tracking this
increase, particularly with respect to where in the world the hits were
coming from. The EPrints statistics management software developed at the
steadily increase. We were very interested in tracking this increase,
particularly with respect to where in the world the hits were coming
from. The EPrints statistics management software developed at the
University of Tasmania \cite{Sale-A-2006-stats} proved very useful in
this regard, providing us with detailed per-eprint and per-country
download statistics; an example of the latter is shown in
Figure~\ref{fig-tas-stats}. However, while this display provides an
city level for the US within a 25 mile radius''
\cite{Maxm-G-2006-GeoLiteCity}. Their commercial \emph{GeoIP City}
database claims 80\% accuracy for the same parameters.
 
The techniques used by these systems can generally be divided into two
classes. The first class of techniques generate a single bitmap image
that contains both the map and the icons representing web hits. This can
be achieved by programmatically plotting points onto a base map image;
the composite image is then displayed at the client. We shall henceforth
refer to this class of techniques as \emph{single layer} techniques.
The second class of techniques separately return both a base map image
and some kind of overlay containing the plotted points. The overlay is
then combined with the base map at the client. We shall henceforth refer
to this class of techniques as \emph{multi-layer} techniques.
The techniques used by these prior systems can generally be divided into
two classes. The first class of techniques generate a single bitmap
image that contains both the map and the graphics representing web hits.
This can be achieved by programmatically plotting points onto a base map
image; the composite image is then displayed at the client. We shall
henceforth refer to this class of techniques as \emph{single-layer}
techniques. The second class of techniques separately return both a base
map image and some kind of overlay containing the plotted points. The
overlay and the base map are then displayed as separate items at the
client. We shall henceforth refer to this class of techniques as
\emph{multi-layer} techniques.
 
Both classes of techniques have been used in the aforementioned systems,
but multi-layer techniques appear to have been particularly popular. For
example, Palantir used a multi-layer technique, where a Java applet running
dynamic multi-layer technique that has only become feasible relatively
recently with the advent of widespread support for CSS positioning and
Ajax technologies in many browsers.
 
Multi-layer techniques enjoy a particular advantage over single layer
Multi-layer techniques enjoy a particular advantage over single-layer
techniques, in that they provide the potential for a more flexible
GIS-like interaction with the map, with multiple layers that can be
activated and deactivated as desired. This flexibility could explain why
such techniques appear more prevalent in the literature. However,
multi-layer techniques tend to rely on more recent web technologies such as
CSS2 and Ajax, whereas single layer techniques generally do not. Single
layer techniques should therefore be portable to a wider range of client
and server environments.
 
Each technique comprises a specific technology or collection of
technologies (such as transparent bitmap overlays), implemented using a
specific distribution style. For example, one single layer technique
might be implemented completely server-side while another might use a
mixture of server-side and client-side processing. Similarly, multi-layer
such techniques appear more prevalent in the literature. As we shall see
shortly, however, web-based multi-layer techniques tend to rely on more
recent web technologies such as CSS and Ajax, whereas single-layer
techniques generally do not. Single-layer techniques should therefore be
portable to a wider range of client and server environments.
 
Each map generation and display technique comprises a specific
technology or collection of technologies (such as transparent bitmap
overlays + CSS positioning), implemented using a specific distribution
style. For example, a particular single-layer technique might be
implemented completely server-side while another might use a mixture of
server-side and client-side processing. Similarly, multi-layer
techniques may adopt different distribution styles, and the overlays
themselves might take the form of transparent images, absolutely
positioned HTML elements, dynamically generated graphics, etc.
 
Given the many possible techniques that were available, the next
question was which techniques would be most suitable for our purposes?
Scalability is a key issue for web applications in general \cite[p.\
28]{Offu-J-2002-quality}, and online activity visualization in
Given the wide variety of possible techniques that were available, the
next question was which techniques would be most suitable for our
purposes? Scalability is a key issue for web applications in general
\cite[p.\ 28]{Offu-J-2002-quality}, and online activity visualization in
particular \cite[p.\ 50]{Eick-SG-2001-sitevis}, so we were particularly
interested in techniques that could scale to a large number of points.
For example, at the time of writing the Otago EPrints repository had
been accessed from over 10,000 distinct IP addresses, each potentially
representing a distinct geographical location. Separating out the type
of hit (abstract view versus document download) increased that figure to
nearly 13,000.
nearly 13,000. Early informal experiments with these data indicated that
a single-layer composite map image would work quite well, whereas Google
Maps would not.
 
We first narrowed down the range of techniques to just four (server-side
image generation, server-side image overlay, server-side HTML overlay
and Google Maps); the selection process and details of the techniques
experiments was conducted on each technique with progressively larger
data sets, and the elapsed time and memory usage were measured. The
experimental design is discussed in Section~\ref{sec-experiment}.
 
Our initial intuition was that server-side image generation and
Our initial intuition was that the server-side image generation and
server-side image overlay techniques would scale best, and this was
borne out by the results of the experiments, which show that both
techniques scale reasonably well to very large numbers of points. The
other two techniques proved to be reasonable for relatively small
performance deteriorated rapidly beyond this. The results are discussed
in more detail in Section~\ref{sec-results}.
 
It should be noted that the intent of the experiments was not to
identify statistically significant differences between techniques. It
was expected that variations across techniques would be obvious, and the
experiments were designed to test this expectation. However, the two
best performing techniques, server-side image generation and server-side
image overlay, produced very similar results, so a more formal
statistical analysis of these techniques may be warranted. This and
other possible future directions are discussed in
Section~\ref{sec-future}.
identify statistically significant differences in performance across the
four techniques. It was expected that variations across techniques would
be reasonably clear-cut, and the experiments were designed to test this
expectation. However, the two best performing techniques, server-side
image generation and server-side image overlay, produced very similar
results, so a more formal statistical analysis of these techniques may
be warranted. This and other possible future directions are discussed in
Section~\ref{sec-conclusion}.
 
 
\section{Technique selection}
\label{sec-techniques}
\emph{data server} style is where the server only supplies raw data, and
all manipulation, display and analysis takes place at the client. In
other words, this is primarily a client-side processing model, as
illustrated in Figure~\ref{fig-distribution-styles}(a). For example,
Palantir implemented a multi-layer technique using this distribution style
\cite{Papa-N-1998-Palantir}, where the source data were generated at the
server and the map was generated, displayed and manipulated by a Java
applet running at the client. The data server distribution style can
provide a very dynamic and interactive environment to the end user, but
clearly requires support for executing application code within the web
browser, typically using something like JavaScript, Java applets or
Palantir implemented a multi-layer technique using this distribution
style \cite{Papa-N-1998-Palantir}, where the source data were generated
at the server and the map was generated, displayed and manipulated by a
Java applet running at the client. The data server distribution style
can provide a very dynamic and interactive environment to the end user,
but clearly requires support for executing application code within the
web browser, typically using something like JavaScript, Java applets or
Flash. JavaScript is now tightly integrated into most browsers, but the
same cannot be said for either Java or Flash. That is, we cannot
necessarily guarantee the existence of a Java virtual machine or Flash
plugin in every browser, which violates our requirement to avoid manual
installation of additional client-side software. We can therefore
eliminate Java- or Flash-based data server techniques from
consideration, but JavaScript-based data server techniques may still be
feasible.
consideration, but JavaScript-based data server techniques are feasible.
Indeed, as we will see in Section~\ref{sec-overlay}, Google Maps is an
example of such a technique.
 
 
\begin{figure}
	\centering
	\begin{tabular}{ccc}
		\includegraphics[scale=1]{data_server}	&
		\includegraphics[scale=0.9]{data_server}	&
		\qquad	&
		\includegraphics[scale=1]{image_server}	\\
		\includegraphics[scale=0.9]{image_server}	\\
		\footnotesize (a) Data server	&
		\qquad	&
		\footnotesize (b) Image server	\\
		\\
		\\
		\includegraphics[scale=1]{model_interaction}	&
		\includegraphics[scale=0.9]{model_interaction}	&
		\qquad	&
		\includegraphics[scale=1]{shared}	\\
		\includegraphics[scale=0.9]{shared}	\\
		\footnotesize (c) Model interaction environment	&
		\qquad	&
		\footnotesize (d) Shared environment	\\
	\end{tabular}
illustrated in Figure~\ref{fig-distribution-styles}(b). Consequently,
techniques that use this style require no additional client-side
software, and thus meet our requirements. The downside is that the
resultant visualization can tend to be very static and non-interactive
in nature, as it is just a simple bitmap image.
in nature, as it is typically just a simple bitmap image.
 
The \emph{model interaction environment} style is where a model created
at the server can be explored at the client, as illustrated in
Figure~\ref{fig-distribution-styles}(c). \citeN{Wood-J-1996-vis}
interaction between the client and server after the model has been
downloaded. This means that while the downloaded model can be very
dynamic and interactive, changing the underlying data requires a new
model to be generated at the server and downloaded to the client.
Similar restrictions apply to techniques using this style as to the
data server style, so Java- and Flash-based model interaction
environment techniques can be eliminated from consideration. For similar
reasons, we can also eliminate solutions that require browser plugins
such as VRML or SVG (although native support for the latter is beginning
to appear in some browsers). It may be possible to implement this
distribution style using only client-side JavaScript, but it is presently
unclear as to how effective this might be.
 
% future work: implement model interaction using JavaScript?
Similar restrictions apply to techniques using this style as to the data
server style, so Java- and Flash-based model interaction environment
techniques can be eliminated from consideration. For similar reasons, we
can also eliminate solutions such as VRML or SVG that require external
browser plugins (although native support for SVG is beginning to appear
in some browsers). It may be possible to implement this distribution
style using only client-side JavaScript, but it is presently unclear as
to how effective this might be.
 
Finally, the \emph{shared environment} style is where data manipulation
is done at the server, but control of that manipulation, rendering, and
display all occur at the client, as illustrated in
interaction environment style, but with the addition of a feedback loop
from the client to the server, thus enabling a more flexible and dynamic
interaction. Ajax technologies \cite{Garr-JJ-2005-Ajax} can easily
support this kind of distribution style. For example,
\citeN{Saya-A-2006-GISWS} discuss the use of Ajax to integrate Google
Maps with existing GIS visualization web services. We can eliminate
\citeN{Saya-A-2006-GISWS} use Ajax to integrate Google Maps with
existing GIS visualization web services. We can eliminate specific
shared environment techniques from consideration based on the same
criteria as were applied to the other three styles.
 
 
\subsection{Single layer techniques}
criteria as were applied to the other three styles (e.g., no Java- or
Flash-based techniques).
 
 
\subsection{Single-layer techniques}
\label{sec-image-gen}
 
As noted earlier, single layer techniques work by directly plotting
As noted earlier, single-layer techniques work by directly plotting
geolocated IP addresses onto a base map image, then displaying the
composite image at the client. A typical example of the kind of output
that might be produced is shown in Figure~\ref{fig-image}. Such
techniques require two specific components: software to programmatically
create and manipulate bitmap images (for example, the GD image
library\footnote{\url{http://www.boutell.com/gd/}}); and software to
transform raw latitude/longitude coordinates into projected map
coordinates on the base map (for example, the PROJ.4 cartographic
projections library\footnote{\url{http://www.remotesensing.org/proj/}}).
 
 
\begin{figure}
	\centering
	\includegraphics[width=0.95\textwidth,keepaspectratio]{ImageGeneration-full}
	\caption{Sample output from the server-side image generation technique.}
transform latitude/longitude coordinates into projected map coordinates
on the base map (for example, the PROJ.4 cartographic projections
library\footnote{\url{http://www.remotesensing.org/proj/}}).
 
 
\begin{figure}
	\centering
	\includegraphics[width=0.9\textwidth,keepaspectratio]{ImageGeneration-full}
	\caption{Sample output from the (single-layer) server-side image
		generation technique.}
	\label{fig-image}
\end{figure}
 
 
Single layer techniques could use any of the distribution styles
Single-layer techniques could use any of the distribution styles
discussed in Section~\ref{sec-distribution}. However, all but the image
server style would require the installation of additional client-side
software for generating images and performing cartographic projection
operations, so we will only consider single layer using the image
server distribution style (or \textbf{server-side image generation})
from this point on.
operations, so we will only consider single-layer techniques that use
the image server distribution style (or \textbf{server-side image
generation}).
 
The server-side image generation technique provides some distinct
advantages. It is relatively simple to implement and is fast at
producing the final image, mainly because it uses existing,
well-established technologies. It is also bandwidth efficient: the size
of the generated map image is determined by the total number of pixels
and the compression method used, rather than by the number of points to
be plotted. The amount of data to be sent to the client should therefore
remain more or less constant, regardless of the number of points
plotted.
well-established technologies. It is also bandwidth efficient, because
the size of the generated map image is determined by its pixel
dimensions and the compression method used, rather than by the number of
points to be plotted. The amount of data to be sent to the client should
therefore remain more or less constant, regardless of the number of
points plotted.
 
This technique also has some disadvantages, however. First, a suitable
base map image must be acquired. This could be generated from a GIS, but
if this is not an option an appropriate image must be obtained from a
third party. Care must be taken in the latter case to avoid potential
copyright issues. Second, the compression method used to produce the
final composite map image can have a significant impact on visual
quality. For example, lossy compression methods such as JPEG can make
the points plotted on the map appear distinctly fuzzy or ``muddy'', as
shown in Figure~\ref{fig-image-quality}. A lossless compression method
such as PNG will avoid this problem, but will tend to produce larger
image files. Finally, it is harder to provide interactive map
manipulation features with this technique, as the output is a simple
static image. Anything that changes the content of the map (such as
panning or changing the visibility of points) will require the entire
image to be regenerated. Zooming could be achieved if a very high
resolution base map image was available, but the number of possible zoom
levels might be restricted.
 
 
\begin{figure}
	\centering
	\includegraphics[scale=1.25]{jpeg_detail}\medskip
	
	\includegraphics[scale=1.25]{overlay_detail}
	\caption{Image quality of JPEG (Q=90) image generation (top) vs.\
	PNG image overlay (bottom).}
third party. Care must be taken in the latter case to avoid copyright
issues. Second, the compression method used to produce the final
composite map image can have a significant impact on visual quality. For
example, lossy compression methods such as JPEG can make the points
plotted on the map appear distinctly fuzzy or ``muddy'', as shown in
Figure~\ref{fig-image-quality}. Lossless compression methods such as PNG
avoid this problem, but may produce larger files for the same image.
Finally, it is harder to provide interactive map manipulation features
with this technique, as the output is a simple static image. Anything
that changes the content of the map (such as panning or changing the
visibility of certain points) will require the entire image to be
regenerated. Zooming could be achieved if a very high resolution base
map image was available, but the number of possible zoom levels might be
restricted.
 
 
\begin{figure}
	\centering
	\includegraphics[scale=0.98]{jpeg_detail}
	\includegraphics[scale=0.98]{overlay_detail}
	\caption{Image quality of JPEG (Q=90) image generation (left) vs.\
	PNG image overlay (right).}
	\label{fig-image-quality}
\end{figure}
 
 
\subsection{Multi-layer techniques}
\label{sec-overlay}
 
% Look for publications regarding the DataCrossing Ajax client.
% See <http://datacrossing.crs4.it/en_Documentation_Overlay_Example.html>.
% They use <IMG> rather than <DIV>, which has the advantage of the image
% being loaded only once, but makes it harder to dynamically change the
% appearance of markers. The amount of data generated will still be
% proportional to the number of points (one <IMG> per point).
 
Multi-layer techniques also involve plotting points onto a base map image,
but they differ from single layer techniques in that the points are
not composited directly onto the base map image. Rather, the points are
Multi-layer techniques also involve plotting points onto a base map
image, but they differ from single-layer techniques in that the points
are not plotted directly onto the base map image. Rather, the points are
displayed as an independent overlay on top of the base map image. This
provides a significant advantage over single layer techniques, as it
provides a significant advantage over single-layer techniques, as it
enables the possibility of multiple independent layers that can be
individually shown or hidden. This is very similar to the multi-layer
functionality provide by GIS, and is an effective way to provided
functionality provided by GIS, and is an effective way to provide
interactive visualizations of geographic data
\cite{Wood-J-1996-vis,MacE-AM-1998-GIS}. We still have the problem of
finding a suitable base map image, however.
 
points are plotted, which is then overlaid on the base map image (in our
implementation, the output looks essentially identical to that shown in
Figure~\ref{fig-image} on page~\pageref{fig-image}). This requires the
overlay image to be in either PNG or GIF format, as JPEG does not
support transparency. Fortunately the overlay image is likely to contain
a lot of ``white space'', which compresses very well, so use of a
support transparency. The overlay image is likely to contain
considerable ``white space'', which compresses very well, so use of a
lossless compression method should not be an issue. This also eliminates
the ``fuzziness'' issue noted earlier (see
Figure~\ref{fig-image-quality}). The size of the image overlay will
generally be proportional to the number of points to be plotted, but the
image compression should have a moderating effect on this.
 
As noted earlier, generating images at the client would require
additional software to be installed, so we will only consider the data
server distribution style for image overlays (or \textbf{server-side
image overlay}). That is, both the base map image and the overlay(s) are
generated at the server.
As noted in Section~\ref{sec-image-gen}, generating images at the client
would require additional software to be installed, so we will only
consider the data server distribution style for image overlays (or
\textbf{server-side image overlay}). That is, both the base map image
and the overlay(s) are generated at the server.
 
An HTML overlay comprises a collection of HTML elements corresponding to
the points to be plotted, which are positioned over the base map image
using CSS absolute positioning. There is considerable flexibility as to
the types of elements that could be used to construct the overlay. One
possibility is to use \verb|<IMG>| elements to place icons on the base
map; this appears to be the approach adopted by Google Maps (see
map, which appears to be the approach adopted by Google Maps (see
Figure~\ref{fig-google}). Another possibility is to use appropriately
sized and colored \verb|<DIV>| elements, which then appear as colored
blocks ``floating'' over the base map image (in our implementation, the
output looks essentially identical to that shown in
 
 
\begin{figure}
	\centering
	\includegraphics[width=0.95\textwidth,keepaspectratio]{GoogleMap-full.png}
	\includegraphics[width=0.9\textwidth,keepaspectratio]{GoogleMap-full.png}
	\caption{Sample output from the Google Maps technique.}
	\label{fig-google}
\end{figure}
 
because only HTML (i.e., text) is being generated, not images. This can
be easily achieved using client-side JavaScript, so HTML overlays can
use any of the distribution styles discussed in
Section~\ref{sec-distribution} without violating our requirements. We
have therefore adopted two representative multi-layer techniques for our
experiments: \textbf{server-side HTML overlays} (using the image server
distribution style) and \textbf{Google Maps} (using the data server
distribution style). Since Google Maps uses \verb|<IMG>| elements, we
have used \verb|<DIV>| elements for the server-side HTML overlay.
have therefore adopted two representative HTML overlay techniques for
our experiments: \textbf{server-side HTML overlays} (using the image
server distribution style) and \textbf{Google Maps} (using the data
server distribution style). Since Google Maps uses \verb|<IMG>|
elements, we have used \verb|<DIV>| elements for the server-side HTML
overlay.
 
Server-side HTML overlays are actually slightly simpler to implement
than either server-side image generation or image overlays, because we
do not need to write any code to generate or manipulate images (the base
map image is static and thus requires no additional processing). All
that is required is code to transform latitude/longitude coordinates
into projected map coordinates and produce corresponding \verb|<DIV>|
into projected map coordinates and generate corresponding \verb|<DIV>|
elements.
 
Google Maps \cite{Goog-M-2006-maps} is a more complex proposition. This
technique uses the data server distribution style, where JavaScript code
running within the browser enables the client to manipulate the base map
and its overlays. Data and map images are requested asynchronously from
the server as required, using Ajax technologies, which seems to imply
the server as required using Ajax technologies, which seems to imply
that Google Maps in fact uses the shared environment distribution style.
However, the server has no involvement beyond simply supplying data to
the client. In the shared environment distribution style, the server is
directly involved in manipulating the map, under the control of the
client. This is clearly not the case with Google Maps.
 
The primary advantage of Google Maps is the powerful functionality it
provides for generating and interacting with the map. Users may pan the
map in any direction and zoom in and out to many different levels. A
map in any direction and zoom to many different levels of detail. A
satellite imagery view is also available. In addition, further
information about each point plotted (such as the name of the city, for
example) can be displayed in a callout attached to the point, as shown
in Figure~\ref{fig-google}.
information about each point plotted (such as the name of the city) can
be displayed in a callout attached to the point, as shown in
Figure~\ref{fig-google}.
 
However, there are also some significant disadvantages to the Google
Maps technique\footnote{Interestingly, the Google Earth application
addresses many of these issues, but since it is not a browser-based
solution it falls outside the scope of our consideration. However, for
interest's sake we did an informal comparison between Google Earth and
the four techniques that we have tested, and this has been included in
the results in Section~\ref{sec-results}.}. First, it is a distributed
application, thus making it more complex to implement, test and debug
\cite{Bates-PC-1995-distdebug,Ensl-PH-1978-distributed}. Second, the
server must have a registered API key from Google, which is verified
every time that a page attempts to use the API. Similarly, the client
must connect to Google's servers in order to to download the API's
JavaScript source. This means that the technique must have an active
Internet connection in order to work. Finally, the Google Maps API does
not currently provide any way to toggle the visibility of markers on the
map, so it is not possible to implement the interactive ``layers''
mentioned at the start of this section. (It is possible, of course, that
Google will implement this feature in a later version of the API.)
solution it falls outside the scope of our consideration.}. First, it is
a distributed application, thus making it more complex to implement,
test and debug \cite{Bates-PC-1995-distdebug,Ensl-PH-1978-distributed}.
Second, the server must have a registered API key from Google, which is
verified every time that a page attempts to use the API. Similarly, the
client must connect to Google's servers in order to to download the
API's JavaScript source. This means that the technique requires an
active Internet connection in order to work. Finally, the Google Maps
API does not currently provide any way to toggle the visibility of
markers on the map, so it is not possible to implement the interactive
``layers'' mentioned at the start of this section. (It is possible, of
course, that Google may implement this feature in a future version of
the API.)
 
The most significant disadvantage of all HTML overlay techniques,
however, is that the size of the HTML overlay is directly proportional
to the number of points to be plotted. There will be one overlay element
(\verb|<DIV>| or \verb|<IMG>|) per point, so a very large number of
points will result in an even larger amount of HTML source being
generated. We expect that this will lead to excessive browser memory
usage, and consequently that these techniques will not scale well at the
high end. However, they may still be useful for smaller data sets that
require interactive manipulation.
high end. However, they may still be appropriate for smaller data sets
that require interactive manipulation.
 
 
\section{Experimental design}
\label{sec-experiment}
 
 
\begin{figure}
	\centering
	\includegraphics[width=0.95\textwidth,keepaspectratio]{16384_points}
	\includegraphics[width=0.9\textwidth,keepaspectratio]{16384_points}
	\caption{The 16,384-point data set plotted on the base map.}
	\label{fig-grid-points}
\end{figure}
 
the local network was a potential confounding factor. We therefore
decided to eliminate network performance from the equation by running
both the server and the client on the same machine\footnote{A Power
Macintosh G5 1.8\,GHz with 1\,GB RAM, running Mac OS X 10.4.7, Apache
2.0.55, PHP 4.4 and Perl 5.8.6.}. This in turn enabled us to measure the
time taken for data generation and page display independently, thus
simplifying the process of data collection and also ensuring that the
client and server processes did not unduly interfere with each other,
despite running on the same machine.
2.0.55, PHP 4.4 and Perl 5.8.6.}. This in turn enabled us to
independently measure the time taken for data generation and page
display, thus simplifying the process of data collection and also
ensuring that the client and server processes did not unduly interfere
with each other, despite running on the same machine.
 
It could be argued that network performance would still have a
confounding effect on the Google Maps technique, but this would only be
likely for the initial download of the API (comprising about 235\,kB of
server from proceeding.
 
For each data set generated, we recorded its size, the time taken to
generate it, the time taken to display the resultant map in the browser,
and the amount of real and virtual memory used during the test by the
browser. We also intended to measure the memory usage of the server, but
and the amount of real and virtual memory used by the browser during the
test. We also intended to measure the memory usage of the server, but
this proved more difficult to isolate than expected, and was thus
dropped from the experiments. The data set generation time and browser
memory usage were measured using the \texttt{time} and \texttt{top}
utilities respectively (the latter was run after each test run to avoid
test'' debugging feature of Apple's Safari web browser, which can
repetitively load a set of pages while recording various statistics, in
particular the time taken to load the page. Tests were run up to twenty
times each where feasible, in order to reduce the impact of random
variations. Some tests were run fewer times because they took a very
long time (several minutes for a single test run). We typically broke
off further testing when a single test run took longer than about five
minutes, as by this stage performance had already deteriorated well
beyond usable levels.
 
While it is somewhat beyond the scope of this work, out of interest some
informal tests were also undertaken using the Google Earth application.
A Perl script was used to generate a collection of KML files
corresponding to the data sets described above. Each data set was then
loaded into Google Earth, and a stopwatch was used to measure how long
it took to load the data set, defined as the period during which the
dialog box ``\textsf{Loading myplaces.kml, including enabled overlays}''
was displayed on screen.
variations. Some tests were run fewer times because they took an
excessive amount of time to complete (i.e., several minutes for a single
test run). We typically broke off further testing when a single test run
took longer than about five minutes, as by this stage performance had
already deteriorated well beyond usable levels.
 
 
\subsection{Technique implementation}
 
\begin{description}
 
	\item[server-side image generation] The dispatcher page included a
	standard \verb|<IMG>| element that called the Perl script. This
	script then loaded a base map PNG image, plotted points directly
	onto it, and returned the
	composite map to the client as a JPEG image (with the ``quality''
	parameter set to 90).
	script loaded a base map PNG image, plotted points directly onto it,
	and returned the composite map to the client as a JPEG image (with
	the ``quality'' parameter set to 90).
 
	\item[server-side image overlay] The dispatcher page included two
	\verb|<IMG>| elements, the first for the base map and the second for
	the overlay, both with identical CSS positioning attributes. The
do a full analysis and statistical comparison of the performance of the
different techniques, but rather to identify broad trends. We have not,
therefore, carried out any statistical analysis on the results. We will
now discuss the results for data size, page load time and memory usage.
Because the data set size increases by powers of two, we have used
log-log scales for all plots.
Because the number of points in each data set increases in powers of
two, we have used log-log scales for all plots.
 
 
\subsection{Data size}
 
During each test run, the data generated by the server was saved to a
file and its size in bytes recorded. In the case of the server-side
image generation and server-side image overlay techniques, the file
comprised a bitmap image; whereas for the server-side HTML overlay and
Google Maps techniques, the file comprised XML data. (The latter was
also true of the KML files generated for use with Google Earth.)
Google Maps techniques, the file comprised HTML or XML text,
respectively.
 
There was a certain amount of fixed overhead for each technique tested,
as summarised in Table~\ref{tab-overhead}. This overhead comprised
static files that were always downloaded to the client, regardless of
	\centering
	\begin{tabular}{lll}
		Technique						&	Fixed overhead		&	Content	\\
		\hline
		Server-side image generation	&	629\,bytes			&	PHP (dispatcher)\smallskip	\\
 
		Server-side image overlay		&	\(\approx\) 181\,kB	&	PHP (dispatcher) \\
		Server-side image generation	&	629\,bytes			&	dispatcher (PHP)\smallskip	\\
 
		Server-side image overlay		&	\(\approx\) 181\,kB	&	dispatcher (PHP) \\
										&						&	base map image (JPEG)\smallskip	\\
 
		Server-side HTML overlay		&	\(\approx\) 181\,kB	&	PHP (dispatcher) \\
		Server-side HTML overlay		&	\(\approx\) 181\,kB	&	dispatcher (PHP) \\
										&						&	base map image (JPEG)\smallskip	\\
 
		Google Maps						&	\(\approx\) 235\,kB	&	PHP (dispatcher) \\
										&						&	base map tiles \\
										&						&	JavaScript (API) \\
										&						&	various icons\smallskip	\\
 
		(Google Earth)					&	unknown				&	\\
		Google Maps						&	\(\approx\) 235\,kB	&	dispatcher (PHP) \\
										&						&	base map image tiles (PNG) \\
										&						&	API (JavaScript) \\
										&						&	various icons (PNG)	\\
	\end{tabular}
	\caption{Fixed overhead for each technique.}
	\label{tab-overhead}
\end{acmtable}
 
 
\begin{figure}
	\includegraphics[scale=0.66]{data_size}
	\centering
	\includegraphics[scale=0.5]{data_size}
	\caption{Comparison of generated data size for each technique (log-log scale).}
	\label{fig-data-size}
\end{figure}
 
 
The amount of data generated for each technique, including fixed
overhead, is shown in Figure~\ref{fig-data-size}. It is immediately
apparent from these results that there is a divergence between the two
techniques that generate bitmap images (server-side image generation and
server-side image overlay), and the remaining techniques that generate
either HTML or XML (i.e., text).
techniques that generate images (server-side image generation and
server-side image overlay), and the two techniques that generate text
(server-side HTML overlay and Google Maps).
 
Both the server-side image generation and server-side image overlay
techniques scale particularly well with regard to the amount of data
generated. Interestingly, the amount of data generated by the image
scale well, and begin to visibly diverge from the other two techniques
once the amount of data generated exceeds about 5\% of the fixed
overhead. For the HTML overlay technique this occurs somewhere between
64 and 128 points, whereas for Google Maps it occurs somewhere between
256 and 512 points. The divergence increases rapidly beyond this point
for both techniques, with the HTML overlay technique suffering the most.
256 and 512 points. The divergence increases rapidly for both techniques
beyond these points, with the HTML overlay technique suffering the most.
The latter occurs because the HTML overlay technique needs to generate
additional CSS attributes in order to correctly position the
\verb|<DIV>| elements, whereas the Google Maps technique needs only to
return a more compact list of latitude/longitude coordinates.
 
For Google Earth, the amount of data generated is clearly proportional
to the number of points, but the Google Earth results are otherwise not
directly comparable with the other techniques, as we were unable to
determine whether Google Earth had any fixed overhead.
additional CSS attributes (i.e., more text) in order to correctly
position the \verb|<DIV>| elements, whereas the Google Maps technique
needs only to return a more compact list of latitude/longitude
coordinates.
 
 
\subsection{Page load time}
 
\subsubsection{Data generation time}
 
 
\begin{figure}
	\includegraphics[scale=0.66]{data_generation_time}
	\centering
	\includegraphics[scale=0.5]{data_generation_time}
	\caption{Comparison of data generation time for each technique (log-log scale).}
	\label{fig-data-generation-time}
\end{figure}
 
 
The results show that the length of time taken to generate the source
data increases in proportion with the amount of points to be plotted, as
expected. It is interesting to note the differences in data generation
time for each technique, however. Data generation for all of the
``text-based'' techniques (HTML overlay, Google Maps and Google Earth)
is consistently faster than for the ``image-based'' techniques (image
generation and image overlay).
 
Server-side image generation generally takes the longest to generate its
data. This is because it not only has to map points from
latitude/longitude into projected map coordinates, but also must plot
these points onto the base map image, then compress the composite image
as a JPEG. The image to be compressed is also moderately complex, which
only adds to the data generation time. Server-side image generation
performs slightly better because it uses a less complex compression
method (PNG) and the image being compressed is much simpler (a
collection of colored points on a blank background).
 
The server-side HTML overlay techniques appears faster at generating
data than either of the two image-based techniques at the low end, but
is similar in performance at the high end. In this technique the server
The results (see Figure~\ref{fig-data-generation-time}) show that the
length of time taken to generate the source data increases in proportion
to the number of points to be plotted, as expected. It is interesting to
note the differences in data generation time for each technique,
however. Data generation for both of the ``text-based'' techniques (HTML
overlay and Google Maps) is consistently faster than for the
``image-based'' techniques (image generation and image overlay).
 
The results show that server-side image generation generally takes the
longest to generate its data. This is because it not only has to map
points from latitude/longitude into projected map coordinates, but also
must plot these points onto the base map image, then compress the
composite image as a JPEG. The image to be compressed is also moderately
complex, which only adds to the data generation time. Server-side image
overlay performs somewhat better because it uses a less complex
compression method (PNG) and the image to be compressed is much simpler
(a collection of colored points on a blank background).
 
The server-side HTML overlay technique appears faster at generating data
than either of the two image-based techniques at the low end, but is
similar in performance at the high end. In this technique the server
only needs to map latitude/longitude to projected map coordinates; no
images need to be generated and there is no compression. At the high
end, however, this advantage is clearly offset by the large volume of
data that is generated. Google Maps is faster again, because all
processing is carried out on the client; the server's only involvement
is to generate a list of latitude/longitude coordinates. A similar
argument also applies for Google Earth.
images need to be generated and there is no compression to deal with. At
the high end, however, this advantage is clearly offset by the
significant volume of data being generated. Google Maps is faster again,
because almost all processing is carried out on the client; the server's
only involvement is to generate a simple list of latitude/longitude
coordinates.
 
In terms of data generation, it appears that all techniques tested scale
reasonably well. The image-based techniques perform worse at the low end
because they involve more complex processing than the text-based
techniques, but this is offset at the high end by the relatively
constant amount of data generated. Conversely, the text-based techniques
perform better at the low end, but are negatively imapcted at the high
perform better at the low end, but are negatively impacted at the high
end by the sheer volume of data produced (tens or hundreds of megabytes
vs.\ hundreds of kilobytes).
 
 
 
 
\begin{figure}
	\centering
	\includegraphics[scale=0.66]{page_load_time}
	\includegraphics[scale=0.5]{page_load_time}
	\caption{Comparison of map display time for each technique (log-log scale).}
	\label{fig-page-load-time}
\end{figure}
 
 
These results show quite a spectacular difference between the
image-based and text-based techniques. The time taken to display the map
is essentially constant for both of the image-based techniques,
regardless of the number of points to be plotted. This is not surprising
given that the size of the generated data is also essentially constant,
and that the browser is simply loading and displaying static images. The
image overlay technique appears slightly slower than the image
generation technique. This is probably because the image overlay
technique has to load two images from the server (the base map and the
overlay), compared to one image for the image generation technique.
These results (see Figure~\ref{fig-page-load-time}) reveal quite a
spectacular difference between the image-based and text-based
techniques. The time taken to display the map is essentially constant
for both of the image-based techniques, regardless of the number of
points to be plotted. This is not surprising given that the size of the
generated data is also essentially constant, and that the browser is
simply loading and displaying static images. The image overlay technique
appears slightly slower than the image generation technique. This is
probably because the image overlay technique has to load two images from
the server (the base map and the overlay), compared to one image for the
image generation technique.
 
In contrast, the text-based technique clearly do not scale well with
regards to map display time. Google Maps suffers particularly in this
regard, with display time exceeding ten seconds shortly past 512 points.
Testing was abandoned at 4,096 points, with a single test run taking
over seven minutes. The HTML overlay technique fares better, exceeding
ten seconds somewhere between 4,096 and 8,192 points. Testing was
abandoned at 32,768 points, with a single test run taking almost ten
minutes. Interestingly, Google Earth performed worse at the low end but
did better at the high end, presumably because it is specifically
designed to handle these kinds of tasks. We were able to reach 131,072
points before testing was abandoned.
minutes.
 
 
\subsubsection{Combined time}
 
 
\begin{figure}
	\centering
	\includegraphics[scale=0.66]{combined_time}
	\includegraphics[scale=0.5]{combined_time}
	\caption{Comparison of combined page load time for each technique (log-log scale).}
	\label{fig-combined-time}
\end{figure}
 
 
Combining the data generation and map display times yields little change
in the curves for the text-based techniques, because the data generation
times are very small compared to the map display times. There is a more
obvious impact on the image-based techniques, with both techniques
remaining more or less constant up to about 2,048 points, then slowing
as the number of points increases beyond that. However, the slowdown is
nowhere near as dramatic as for the text-based techniques; even the
largest data set only takes about nineteen seconds overall. The image
overlay technique does display a slight advantage of about half a second
over the image generation technique for the largest data set, but
further experiments will be required to determine whether this is a
statistically significant difference.
Combining the data generation and map display times (see
Figure~\ref{fig-combined-time}) yields little change in the curves for
the text-based techniques, because the data generation times are very
small compared to the map display times. There is a more obvious impact
on the image-based techniques, with both techniques remaining more or
less constant up to about 2,048 points, then slowing as the number of
points increases beyond that. However, the slowdown is nowhere near as
dramatic as for the text-based techniques; even the largest data set
only takes about nineteen seconds overall. The image overlay technique
does display a slight advantage of about half a second over the image
generation technique for the largest data set, but further experiments
will be required to determine whether this is a statistically
significant difference.
 
 
\subsection{Memory usage}
 
memory usage in each category. This told us the size of both the current
``working set'' and the total memory footprint of the browser process
after it had completed a test run. The real memory results are shown in
Figure~\ref{fig-real-memory} and the virtual memory results are shown in 
Figure~\ref{fig-virtual-memory}
 
 
\begin{figure}
	\centering
	\includegraphics[scale=0.66]{real_memory}
Figure~\ref{fig-virtual-memory}.
 
 
\begin{figure}
	\centering
	\includegraphics[scale=0.5]{real_memory}
	\caption{Comparison of real memory usage for each technique (log-log scale).}
	\label{fig-real-memory}
\end{figure}
 
 
\begin{figure}
	\centering
	\includegraphics[scale=0.66]{virtual_memory}
	\includegraphics[scale=0.5]{virtual_memory}
	\caption{Comparison of virtual memory usage for each technique (log-log scale).}
	\label{fig-virtual-memory}
\end{figure}
 
 
While both sets of results display similar trends, the real memory data
proved somewhat difficult to interpret. Real memory usage was generally
consistent across test runs, but would also frequently fluctuate upwards
by a factor of nearly two for no readily apparent reason. This is
proved somewhat problematic. Real memory usage was generally consistent
across test runs, but would also frequently fluctuate upwards by a
factor of nearly two for no readily apparent reason. This is
particularly apparent with the HTML overlay technique beyond 1,024
points. We can only assume that this was a result of other processes on
the test machine interacting with the browser process in unexpected
ways. We are therefore somewhat wary of the real memory data, but they
memory footprint of a process is less likely to be impacted by other
running processes.
 
The results show that the two image-based techniques have essentially
constaint memory usage regardless of the number of points plotted. This
constant memory usage regardless of the number of points plotted. This
is to be expected, given that the size of the source data is also
essentially constant. The text-based techniques, however, clearly begin
to diverge as the number of points increases. The HTML overlay technique
starts to visibly diverge somewhere between 2,048 and 4,096 points,
while Google Maps starts to visbly diverge 64 and 128 points. This is in
line with our expectation for these techniques that memory usage would
increase in proportion to the number of points.
 
 
\section{Conclusion}
while Google Maps starts to visibly diverge 64 and 128 points. This is
in line with our expectation for these techniques that memory usage
would increase in proportion to the number of points. It is intriguing
to note that for both techniques, there appears little consistency as to
where the performance of each measure begins to diverge, as shown in
Table~\ref{tab-divergence} (although Google Maps appears to exhibit
greater consistency than HTML overlay in this regard).
 
 
\begin{acmtable}{11cm}
	\centering
	\begin{tabular}{lccc}
		Technique						&	Data size	&	Map display time	&	Virtual memory	\\
		\hline
		Server-side HTML overlay		&	64--128		&	128--256			&	2,048--4,096 \\
		Google Maps						&	256--512	&	64--128				&	64--128	\\
	\end{tabular}
	\caption{Approximate number of points at which each measure begins to diverge,
		for the HTML overlay and Google Maps techniques.}
	\label{tab-divergence}
\end{acmtable}
 
 
\section{Conclusion and future work}
\label{sec-conclusion}
 
In this research, we tested the scalability of four techniques for
online geovisualization of web site hits, with respect to the number of
points to be plotted on the map. The four techniques tested were
server-side image generation and server-side image overlay techniques
scale the best from small to large data sets. The HTML overlay and
Google Maps techniques work well for small data sets, but their
performance rapidly deteriorates as the size of the data set increases,
to the point where they are essentially unusable.
 
Our aim was to identify which was the best technique to use to plot hits
on the Otago School of Business repository. We are now left with a choice
between two very similarly-performing techniques.
 
 
% The
% software extracts IP addresses from the web server logs, geolocates them
% using the free MaxMind GeoLite Country database\footnote{See
% \url{http://www.maxmind.com/app/ip-location}.}, then stores the
% resulting country information in a separate database.
 
% The Tasmania software, however, uses countries as its base unit of
% aggregation. We were interested in looking at the distribution on a finer
% level, down to individual cities if possible
to the point where they become unusable.
 
Despite this clear difference in scalability, we are still left with
some interesting questions. We did not investigate the model interaction
environment distribution style in this research, as it was unclear
whether this could be achieved using only client-side JavaScript. This
is clearly an avenue for further investigation. In addition, the
appearance of native SVG support in some browsers means that this may
also become a viable option in future.
 
We were somewhat surprised that the server-side HTML overlay and Google
Maps techniques exhibited no obvious consistency in where the different
measures (data size, map display time and virtual memory usage)
diverged. It seems logical that some form of correlation might exist, so
further research will be required to investigate this. One possibility
might be to implement an instrumented web browser and server in order to
gather more precise data.
 
Shortly after completing our experiments, we discovered \emph{msCross
Webgis}\footnote{\url{http://datacrossing.crs4.it/en_Documentation_mscross.html}},
which is an open source Google Maps clone. Its documentation implies
that it may be possible to build a fully self-contained implementation
that requires no external network access. This would enable us to test
on an isolated network with the client and server running on different
machines. We could then include measurements of network transfer time,
and eliminate any problems caused by running the client and server on
the same machine.
 
Our overall aim was to identify which was the best technique to use to
plot downloads and abstract views from the Otago School of Business
digital repository. Based on our results, both the server-side HTML
overlay and Google Maps techniques are clearly inappropriate for this
task. This leaves us with a choice between two very similarly-performing
techniques: server-side image generation and server-side image overlay.
However, the practical advantages of multi-layer techniques over
single-layer techniques, such as the ability to dynamically show and
hide multiple overlays, mean that server-side image overlay is the clear
winner in this case.
 
 
\begin{acks}
The author would like to acknowledge Dr.\ Antoni Moore and Prof.\ George
Benwell for their input into this research.
\end{acks}
 
 
\bibliography{Map_Visualisation}
 
 
\begin{received}
...
\end{received}
            \documentclass[acmnow]{acmtrans2m}


\usepackage{graphicx}


\newtheorem{theorem}{Theorem}[section]
\newtheorem{conjecture}[theorem]{Conjecture}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{lemma}[theorem]{Lemma}
\newdef{definition}[theorem]{Definition}
\newdef{remark}[theorem]{Remark}


           
\markboth{Nigel Stanger}{...}

\title{Scalability of Techniques for Online Geovisualization of Web Site Hits}
            
\author{NIGEL STANGER \\ University of Otago}
            
\begin{abstract} 
A useful approach to visualising the geographical distribution of web
site hits is to geolocate the IP addresses of hits and plot them on a
world map. This can be achieved by dynamic generation and display of map
images at the server and/or the client. In this paper we compare the
scalability with respect to source data size of four techniques for
dynamic map generation and display: generating a single composite map
image, overlaying transparent images on an underlying base map,
overlaying CSS-enabled HTML on an underlying base map and generating a
map using Google Maps. These four techniques embody a mixture of
different display technologies and distribution styles. The results show
that all four techniques are suitable for small data sets, but that the
latter two techniques scale poorly to larger data sets.
\end{abstract}
            
\category{C.4}{Performance of Systems}{Performance attributes}
\category{C.2.4}{Computer-Communication Networks}{Distributed Systems}[distributed applications]
\category{H.3.5}{Information Storage and Retrieval}{Online Information Services}[web-based services]
            
\terms{Experimentation, Measurement, Performance} 
            
\keywords{downloads, geolocation, geovisualization, scalability, Google
	Maps, distribution style, dynamic map generation}
            
\begin{document}


\bibliographystyle{acmtrans}

            
\begin{bottomstuff} 
Author's address: N. Stanger, Department of Information Science,
University of Otago, PO Box 56, Dunedin 9054, New Zealand.
\end{bottomstuff}
            
\maketitle


\section{Introduction}
\label{sec-introduction}

When administering a web site, it is quite reasonable to want
information on the nature of traffic to the site. Information on the
geographic sources of traffic can be particularly useful in the right
context. For example, an e-commerce site might wish to determine the
geographical distribution of visitors to its site, so that it can decide
where best to target its marketing resources. One approach to doing so
is to plot the geographical location of web site hits on a map.
Geographical information systems (GIS) were already being used for these
kinds of purposes prior to the advent of the World Wide Web
\cite{Beau-JR-1991-GIS}, and it is a natural extension to apply these
ideas to online visualization of web site hits.

Our interest in this area derives from implementing a pilot digital
institutional repository at the University of
Otago\footnote{\url{http://eprints.otago.ac.nz/}} in November 2005
\cite{Stan-N-2006-running}, using the GNU
EPrints\footnote{\url{http://www.eprints.org/}} repository management
software. This repository quickly attracted interest from around the
world and the number of abstract views and document downloads began to
steadily increase. We were very interested in tracking this increase,
particularly with respect to where in the world the hits were coming
from. The EPrints statistics management software developed at the
University of Tasmania \cite{Sale-A-2006-stats} proved very useful in
this regard, providing us with detailed per-eprint and per-country
download statistics; an example of the latter is shown in
Figure~\ref{fig-tas-stats}. However, while this display provides an
ordered ranking of the number of hits from each country, it does not
provide any greater detail than to the country level, nor does it
provide any visual clues as to the distribution of hit sources around
the globe.


\begin{figure}
	\centering
	\includegraphics[scale=0.65]{tasmania_stats}
	\caption{A portion of the by-country display for the Otago EPrints
	repository, generated by the Tasmania statistics software.}
	\label{fig-tas-stats}
\end{figure}


We therefore began to explore possible techniques for plotting our
repository hit data onto a world map, with the aim of adding this
capability to the Tasmania statistics package. Our preference was for a
technique that could be used within a modern web browser without the
need to manually install additional client software, so as to make the
new feature available to the widest possible audience and reduce the
impact of wide variation in client hardware and software environments
\cite[pp.\ 27--28]{Offu-J-2002-quality}.

There have been several prior efforts to geovisualize web activity.
\citeN{Lamm-SE-1996-webvis} developed a sophisticated system for
real-time visualization of web traffic on a 3D globe, but this was
intended for use within a virtual reality environment, thus limiting its
general applicability. \citeN{Papa-N-1998-Palantir} described a similar
system (Palantir), which was written as a Java applet and thus able to
be run within a web browser, assuming that a Java virtual machine was
available. \citeN[pp.\ 100--103]{Dodg-M-2001-cybermap} describe these
and several other related systems for mapping Web and Internet traffic.

These early systems suffered from a distinct limitation in that there
was no public infrastructure in place for geolocating IP addresses (that
is, translating them into latitude/longitude coordinates). They
generally used \texttt{whois} lookups or parsed the domain name in an
attempt to guess the country of origin, with fairly crude results
\cite{Lamm-SE-1996-webvis}. Locations outside the United States were
typically aggregated by country and mapped to the capital city
\cite{Lamm-SE-1996-webvis,Papa-N-1998-Palantir,Jian-B-2000-cybermap}.
Reasonably accurate and detailed databases were commercially available
at the time \cite[p.\ 1466]{Lamm-SE-1996-webvis}, but were not generally
available to the public at large, thus limiting their utility.

The situation has improved considerably in the last five years, however,
with the advent of freely available and reasonably accurate geolocation
services\footnote{Such as \url{http://www.maxmind.com/} or
\url{http://www.ip2location.com/}.} with worldwide coverage and
city-level resolution. For example, Maxmind's \emph{GeoLite City}
database is freely available and claims to provide ``60\% accuracy on a
city level for the US within a 25 mile radius''
\cite{Maxm-G-2006-GeoLiteCity}. Their commercial \emph{GeoIP City}
database claims 80\% accuracy for the same parameters.

The techniques used by these prior systems can generally be divided into
two classes. The first class of techniques generate a single bitmap
image that contains both the map and the graphics representing web hits.
This can be achieved by programmatically plotting points onto a base map
image; the composite image is then displayed at the client. We shall
henceforth refer to this class of techniques as \emph{single-layer}
techniques. The second class of techniques separately return both a base
map image and some kind of overlay containing the plotted points. The
overlay and the base map are then displayed as separate items at the
client. We shall henceforth refer to this class of techniques as
\emph{multi-layer} techniques.

Both classes of techniques have been used in the aforementioned systems,
but multi-layer techniques appear to have been particularly popular. For
example, Palantir used a multi-layer technique, where a Java applet running
at the client overlaid graphic elements onto a base map image retrieved
from the now-defunct Xerox online map server
\cite{Papa-N-1998-Palantir}. A more recent example is the Google Maps
API \cite{Goog-M-2006-maps}, which enables web developers to easily
embed dynamic, interactive maps within web pages. Google Maps is a
dynamic multi-layer technique that has only become feasible relatively
recently with the advent of widespread support for CSS positioning and
Ajax technologies in many browsers.

Multi-layer techniques enjoy a particular advantage over single-layer
techniques, in that they provide the potential for a more flexible
GIS-like interaction with the map, with multiple layers that can be
activated and deactivated as desired. This flexibility could explain why
such techniques appear more prevalent in the literature. As we shall see
shortly, however, web-based multi-layer techniques tend to rely on more
recent web technologies such as CSS and Ajax, whereas single-layer
techniques generally do not. Single-layer techniques should therefore be
portable to a wider range of client and server environments.

Each map generation and display technique comprises a specific
technology or collection of technologies (such as transparent bitmap
overlays + CSS positioning), implemented using a specific distribution
style. For example, a particular single-layer technique might be
implemented completely server-side while another might use a mixture of
server-side and client-side processing. Similarly, multi-layer
techniques may adopt different distribution styles, and the overlays
themselves might take the form of transparent images, absolutely
positioned HTML elements, dynamically generated graphics, etc.

Given the wide variety of possible techniques that were available, the
next question was which techniques would be most suitable for our
purposes? Scalability is a key issue for web applications in general
\cite[p.\ 28]{Offu-J-2002-quality}, and online activity visualization in
particular \cite[p.\ 50]{Eick-SG-2001-sitevis}, so we were particularly
interested in techniques that could scale to a large number of points.
For example, at the time of writing the Otago EPrints repository had
been accessed from over 10,000 distinct IP addresses, each potentially
representing a distinct geographical location. Separating out the type
of hit (abstract view versus document download) increased that figure to
nearly 13,000. Early informal experiments with these data indicated that
a single-layer composite map image would work quite well, whereas Google
Maps would not.

We first narrowed down the range of techniques to just four (server-side
image generation, server-side image overlay, server-side HTML overlay
and Google Maps); the selection process and details of the techniques
chosen are discussed in Section~\ref{sec-techniques}. We then set about
testing the scalability of these four techniques, in order to determine
how well each technique handled large numbers of points. A series of
experiments was conducted on each technique with progressively larger
data sets, and the elapsed time and memory usage were measured. The
experimental design is discussed in Section~\ref{sec-experiment}.

Our initial intuition was that the server-side image generation and
server-side image overlay techniques would scale best, and this was
borne out by the results of the experiments, which show that both
techniques scale reasonably well to very large numbers of points. The
other two techniques proved to be reasonable for relatively small
numbers of points (generally less than about 500--1,000), but their
performance deteriorated rapidly beyond this. The results are discussed
in more detail in Section~\ref{sec-results}.

It should be noted that the intent of the experiments was not to
identify statistically significant differences in performance across the
four techniques. It was expected that variations across techniques would
be reasonably clear-cut, and the experiments were designed to test this
expectation. However, the two best performing techniques, server-side
image generation and server-side image overlay, produced very similar
results, so a more formal statistical analysis of these techniques may
be warranted. This and other possible future directions are discussed in
Section~\ref{sec-conclusion}.


\section{Technique selection}
\label{sec-techniques}

In this section we discuss in more detail the four techniques that we
chose for testing, and how we decided upon these particular techniques.
First, we discuss the impact of distribution style on the choice of
technique. Then, for each of the four chosen techniques, we examine how
the technique works in practice, its implementation requirements, its
relative advantages and disadvantages, and any other issues peculiar to
the technique.


\subsection{Distribution style}
\label{sec-distribution}

\citeN{Wood-J-1996-vis} and \citeN{MacE-AM-1998-GIS} identified four
distribution styles for web-based geographic visualization software. The
\emph{data server} style is where the server only supplies raw data, and
all manipulation, display and analysis takes place at the client. In
other words, this is primarily a client-side processing model, as
illustrated in Figure~\ref{fig-distribution-styles}(a). For example,
Palantir implemented a multi-layer technique using this distribution
style \cite{Papa-N-1998-Palantir}, where the source data were generated
at the server and the map was generated, displayed and manipulated by a
Java applet running at the client. The data server distribution style
can provide a very dynamic and interactive environment to the end user,
but clearly requires support for executing application code within the
web browser, typically using something like JavaScript, Java applets or
Flash. JavaScript is now tightly integrated into most browsers, but the
same cannot be said for either Java or Flash. That is, we cannot
necessarily guarantee the existence of a Java virtual machine or Flash
plugin in every browser, which violates our requirement to avoid manual
installation of additional client-side software. We can therefore
eliminate Java- or Flash-based data server techniques from
consideration, but JavaScript-based data server techniques are feasible.
Indeed, as we will see in Section~\ref{sec-overlay}, Google Maps is an
example of such a technique.


\begin{figure}
	\centering
	\begin{tabular}{ccc}
		\includegraphics[scale=0.9]{data_server}	&
		\qquad	&
		\includegraphics[scale=0.9]{image_server}	\\
		\footnotesize (a) Data server	&
		\qquad	&
		\footnotesize (b) Image server	\\
		\\
		\\
		\includegraphics[scale=0.9]{model_interaction}	&
		\qquad	&
		\includegraphics[scale=0.9]{shared}	\\
		\footnotesize (c) Model interaction environment	&
		\qquad	&
		\footnotesize (d) Shared environment	\\
	\end{tabular}
	\caption{Distribution styles for web-based geographic visualization
	\protect\cite{Wood-J-1996-vis}. (F = filtering, M = mapping, R =
	rendering.)}
	\label{fig-distribution-styles}
\end{figure}


In contrast, the \emph{image server} style is where the display is
created entirely at the server and is only viewed at the client. In
other words, this is primarily a server-side processing model, as
illustrated in Figure~\ref{fig-distribution-styles}(b). Consequently,
techniques that use this style require no additional client-side
software, and thus meet our requirements. The downside is that the
resultant visualization can tend to be very static and non-interactive
in nature, as it is typically just a simple bitmap image.

The \emph{model interaction environment} style is where a model created
at the server can be explored at the client, as illustrated in
Figure~\ref{fig-distribution-styles}(c). \citeN{Wood-J-1996-vis}
originally referred to this as the ``3D model interaction'' style, but
this seems slightly out of place in the current context. They originally
intended this distribution style to apply to VRML models for GIS
applications, but it could be equally applied to any situation where an
interactive model is generated at the server, then downloaded to and
manipulated at the client. This is very similar to what happens with
many Flash-based applications, for example. ``Model interaction
environment'' therefore seems a more appropriate name for this style.
The key distinguishing feature of this style is that there is no further
interaction between the client and server after the model has been
downloaded. This means that while the downloaded model can be very
dynamic and interactive, changing the underlying data requires a new
model to be generated at the server and downloaded to the client.
Similar restrictions apply to techniques using this style as to the data
server style, so Java- and Flash-based model interaction environment
techniques can be eliminated from consideration. For similar reasons, we
can also eliminate solutions such as VRML or SVG that require external
browser plugins (although native support for SVG is beginning to appear
in some browsers). It may be possible to implement this distribution
style using only client-side JavaScript, but it is presently unclear as
to how effective this might be.

Finally, the \emph{shared environment} style is where data manipulation
is done at the server, but control of that manipulation, rendering, and
display all occur at the client, as illustrated in
Figure~\ref{fig-distribution-styles}(d). This is similar to the model
interaction environment style, but with the addition of a feedback loop
from the client to the server, thus enabling a more flexible and dynamic
interaction. Ajax technologies \cite{Garr-JJ-2005-Ajax} can easily
support this kind of distribution style. For example,
\citeN{Saya-A-2006-GISWS} use Ajax to integrate Google Maps with
existing GIS visualization web services. We can eliminate specific
shared environment techniques from consideration based on the same
criteria as were applied to the other three styles (e.g., no Java- or
Flash-based techniques).


\subsection{Single-layer techniques}
\label{sec-image-gen}

As noted earlier, single-layer techniques work by directly plotting
geolocated IP addresses onto a base map image, then displaying the
composite image at the client. A typical example of the kind of output
that might be produced is shown in Figure~\ref{fig-image}. Such
techniques require two specific components: software to programmatically
create and manipulate bitmap images (for example, the GD image
library\footnote{\url{http://www.boutell.com/gd/}}); and software to
transform latitude/longitude coordinates into projected map coordinates
on the base map (for example, the PROJ.4 cartographic projections
library\footnote{\url{http://www.remotesensing.org/proj/}}).


\begin{figure}
	\centering
	\includegraphics[width=0.9\textwidth,keepaspectratio]{ImageGeneration-full}
	\caption{Sample output from the (single-layer) server-side image
		generation technique.}
	\label{fig-image}
\end{figure}


Single-layer techniques could use any of the distribution styles
discussed in Section~\ref{sec-distribution}. However, all but the image
server style would require the installation of additional client-side
software for generating images and performing cartographic projection
operations, so we will only consider single-layer techniques that use
the image server distribution style (or \textbf{server-side image
generation}).

The server-side image generation technique provides some distinct
advantages. It is relatively simple to implement and is fast at
producing the final image, mainly because it uses existing,
well-established technologies. It is also bandwidth efficient, because
the size of the generated map image is determined by its pixel
dimensions and the compression method used, rather than by the number of
points to be plotted. The amount of data to be sent to the client should
therefore remain more or less constant, regardless of the number of
points plotted.

This technique also has some disadvantages, however. First, a suitable
base map image must be acquired. This could be generated from a GIS, but
if this is not an option an appropriate image must be obtained from a
third party. Care must be taken in the latter case to avoid copyright
issues. Second, the compression method used to produce the final
composite map image can have a significant impact on visual quality. For
example, lossy compression methods such as JPEG can make the points
plotted on the map appear distinctly fuzzy or ``muddy'', as shown in
Figure~\ref{fig-image-quality}. Lossless compression methods such as PNG
avoid this problem, but may produce larger files for the same image.
Finally, it is harder to provide interactive map manipulation features
with this technique, as the output is a simple static image. Anything
that changes the content of the map (such as panning or changing the
visibility of certain points) will require the entire image to be
regenerated. Zooming could be achieved if a very high resolution base
map image was available, but the number of possible zoom levels might be
restricted.


\begin{figure}
	\centering
	\includegraphics[scale=0.98]{jpeg_detail}
	\includegraphics[scale=0.98]{overlay_detail}
	\caption{Image quality of JPEG (Q=90) image generation (left) vs.\
	PNG image overlay (right).}
	\label{fig-image-quality}
\end{figure}


\subsection{Multi-layer techniques}
\label{sec-overlay}

Multi-layer techniques also involve plotting points onto a base map
image, but they differ from single-layer techniques in that the points
are not plotted directly onto the base map image. Rather, the points are
displayed as an independent overlay on top of the base map image. This
provides a significant advantage over single-layer techniques, as it
enables the possibility of multiple independent layers that can be
individually shown or hidden. This is very similar to the multi-layer
functionality provided by GIS, and is an effective way to provide
interactive visualizations of geographic data
\cite{Wood-J-1996-vis,MacE-AM-1998-GIS}. We still have the problem of
finding a suitable base map image, however.

Until relatively recently, implementing multi-layer techniques would likely
have required additional software at the client, but most modern
browsers now support absolute positioning of elements using CSS. This
enables us to create a map overlay using nothing more than HTML, CSS and
a few bitmap images. We have identified two main alternatives for
producing such an overlay, which we have termed \emph{image overlay} and
\emph{HTML overlay}.

An image overlay comprises a transparent bitmap image into which the
points are plotted, which is then overlaid on the base map image (in our
implementation, the output looks essentially identical to that shown in
Figure~\ref{fig-image} on page~\pageref{fig-image}). This requires the
overlay image to be in either PNG or GIF format, as JPEG does not
support transparency. The overlay image is likely to contain
considerable ``white space'', which compresses very well, so use of a
lossless compression method should not be an issue. This also eliminates
the ``fuzziness'' issue noted earlier (see
Figure~\ref{fig-image-quality}). The size of the image overlay will
generally be proportional to the number of points to be plotted, but the
image compression should have a moderating effect on this.

As noted in Section~\ref{sec-image-gen}, generating images at the client
would require additional software to be installed, so we will only
consider the data server distribution style for image overlays (or
\textbf{server-side image overlay}). That is, both the base map image
and the overlay(s) are generated at the server.

An HTML overlay comprises a collection of HTML elements corresponding to
the points to be plotted, which are positioned over the base map image
using CSS absolute positioning. There is considerable flexibility as to
the types of elements that could be used to construct the overlay. One
possibility is to use \verb|<IMG>| elements to place icons on the base
map, which appears to be the approach adopted by Google Maps (see
Figure~\ref{fig-google}). Another possibility is to use appropriately
sized and colored \verb|<DIV>| elements, which then appear as colored
blocks ``floating'' over the base map image (in our implementation, the
output looks essentially identical to that shown in
Figure~\ref{fig-image} on page~\pageref{fig-image}).


\begin{figure}
	\centering
	\includegraphics[width=0.9\textwidth,keepaspectratio]{GoogleMap-full.png}
	\caption{Sample output from the Google Maps technique.}
	\label{fig-google}
\end{figure}


HTML overlays may be generated at either the server or the client.
Unlike the techniques discussed previously, however, HTML overlays can
be generated at the client without the need for additional software,
because only HTML (i.e., text) is being generated, not images. This can
be easily achieved using client-side JavaScript, so HTML overlays can
use any of the distribution styles discussed in
Section~\ref{sec-distribution} without violating our requirements. We
have therefore adopted two representative HTML overlay techniques for
our experiments: \textbf{server-side HTML overlays} (using the image
server distribution style) and \textbf{Google Maps} (using the data
server distribution style). Since Google Maps uses \verb|<IMG>|
elements, we have used \verb|<DIV>| elements for the server-side HTML
overlay.

Server-side HTML overlays are actually slightly simpler to implement
than either server-side image generation or image overlays, because we
do not need to write any code to generate or manipulate images (the base
map image is static and thus requires no additional processing). All
that is required is code to transform latitude/longitude coordinates
into projected map coordinates and generate corresponding \verb|<DIV>|
elements.

Google Maps \cite{Goog-M-2006-maps} is a more complex proposition. This
technique uses the data server distribution style, where JavaScript code
running within the browser enables the client to manipulate the base map
and its overlays. Data and map images are requested asynchronously from
the server as required using Ajax technologies, which seems to imply
that Google Maps in fact uses the shared environment distribution style.
However, the server has no involvement beyond simply supplying data to
the client. In the shared environment distribution style, the server is
directly involved in manipulating the map, under the control of the
client. This is clearly not the case with Google Maps.

The primary advantage of Google Maps is the powerful functionality it
provides for generating and interacting with the map. Users may pan the
map in any direction and zoom to many different levels of detail. A
satellite imagery view is also available. In addition, further
information about each point plotted (such as the name of the city) can
be displayed in a callout attached to the point, as shown in
Figure~\ref{fig-google}.

However, there are also some significant disadvantages to the Google
Maps technique\footnote{Interestingly, the Google Earth application
addresses many of these issues, but since it is not a browser-based
solution it falls outside the scope of our consideration.}. First, it is
a distributed application, thus making it more complex to implement,
test and debug \cite{Bates-PC-1995-distdebug,Ensl-PH-1978-distributed}.
Second, the server must have a registered API key from Google, which is
verified every time that a page attempts to use the API. Similarly, the
client must connect to Google's servers in order to to download the
API's JavaScript source. This means that the technique requires an
active Internet connection in order to work. Finally, the Google Maps
API does not currently provide any way to toggle the visibility of
markers on the map, so it is not possible to implement the interactive
``layers'' mentioned at the start of this section. (It is possible, of
course, that Google may implement this feature in a future version of
the API.)

The most significant disadvantage of all HTML overlay techniques,
however, is that the size of the HTML overlay is directly proportional
to the number of points to be plotted. There will be one overlay element
(\verb|<DIV>| or \verb|<IMG>|) per point, so a very large number of
points will result in an even larger amount of HTML source being
generated. We expect that this will lead to excessive browser memory
usage, and consequently that these techniques will not scale well at the
high end. However, they may still be appropriate for smaller data sets
that require interactive manipulation.


\section{Experimental design}
\label{sec-experiment}

After some preliminary testing with live data from the Otago School of
Business repository, we proceeded with a series of experiments to test
the scalability of the four techniques. Each technique was tested using
progressively larger synthetic data sets. The first data set comprised
one point at the South Pole (latitude \(-90^{\circ}\), longitude
\(-180^{\circ}\)). Each successive data set was twice the size of its
predecessor, building up a regular grid of latitude/longitude points at
one degree intervals\footnote{The entire grid has 64,800 points, so the
five largest data sets have many duplicate points.}. A total of
twenty-one data sets were created in this way, with the number of points
ranging from one to 1,048,576 (\(=2^{20}\)). The result of plotting the
16,384-point data set is shown in Figure~\ref{fig-grid-points}.


\begin{figure}
	\centering
	\includegraphics[width=0.9\textwidth,keepaspectratio]{16384_points}
	\caption{The 16,384-point data set plotted on the base map.}
	\label{fig-grid-points}
\end{figure}


The focus on scalability meant that we were primarily interested in
measuring page load times, memory usage and the amount of data
generated (which impacts on both storage and network bandwidth). Page
load time can be further broken down into the time taken to generate the
map data, the time taken to transfer the map data to the client across
the network, and the time taken by the client to display the map.

Unfortunately, the Google Maps technique requires an active Internet
connection (as noted in Section~\ref{sec-overlay}), so we were unable to
run the experiments on an isolated network. This meant that traffic on
the local network was a potential confounding factor. We therefore
decided to eliminate network performance from the equation by running
both the server and the client on the same machine\footnote{A Power
Macintosh G5 1.8\,GHz with 1\,GB RAM, running Mac OS X 10.4.7, Apache
2.0.55, PHP 4.4 and Perl 5.8.6.}. This in turn enabled us to
independently measure the time taken for data generation and page
display, thus simplifying the process of data collection and also
ensuring that the client and server processes did not unduly interfere
with each other, despite running on the same machine.

It could be argued that network performance would still have a
confounding effect on the Google Maps technique, but this would only be
likely for the initial download of the API (comprising about 235\,kB of
JavaScript source and images), which would be locally cached thereafter.
The API key verification does occur every time the map is loaded, but
the amount of data involved is very small, so it is less likely that
this would be significantly affected by network performance. Any such
effect would also be immediately obvious as it would simply block the
server from proceeding.

For each data set generated, we recorded its size, the time taken to
generate it, the time taken to display the resultant map in the browser,
and the amount of real and virtual memory used by the browser during the
test. We also intended to measure the memory usage of the server, but
this proved more difficult to isolate than expected, and was thus
dropped from the experiments. The data set generation time and browser
memory usage were measured using the \texttt{time} and \texttt{top}
utilities respectively (the latter was run after each test run to avoid
interference). The map display time was measured using the ``page load
test'' debugging feature of Apple's Safari web browser, which can
repetitively load a set of pages while recording various statistics, in
particular the time taken to load the page. Tests were run up to twenty
times each where feasible, in order to reduce the impact of random
variations. Some tests were run fewer times because they took an
excessive amount of time to complete (i.e., several minutes for a single
test run). We typically broke off further testing when a single test run
took longer than about five minutes, as by this stage performance had
already deteriorated well beyond usable levels.


\subsection{Technique implementation}

As noted in Sections~\ref{sec-image-gen} and \ref{sec-overlay}, the
server-side image generation, server-side image overlay and server-side
HTML overlay techniques were all implemented using the image server
distribution style. A separate dispatcher page was written in PHP for
each technique, which enabled arguments---such as the number of points
to be plotted---to be passed from the client to a corresponding Perl
script for each technique. The final page was then constructed as
follows:
\begin{description}

	\item[server-side image generation] The dispatcher page included a
	standard \verb|<IMG>| element that called the Perl script. This
	script loaded a base map PNG image, plotted points directly onto it,
	and returned the composite map to the client as a JPEG image (with
	the ``quality'' parameter set to 90).

	\item[server-side image overlay] The dispatcher page included two
	\verb|<IMG>| elements, the first for the base map and the second for
	the overlay, both with identical CSS positioning attributes. The
	first \verb|<IMG>| simply loaded a static JPEG image representing
	the base map. The second \verb|<IMG>| called the Perl script, which
	generated and returned the overlay as a transparent PNG image.

	\item[server-side HTML overlay] The dispatcher page included a
	\verb|<IMG>| element for the base map and a \verb|<DIV>| element for
	the overlay, both with identical CSS positioning attributes. As with
	the previous technique, the \verb|<IMG>| simply loaded a static JPEG
	image representing the base map. The \verb|<DIV>| contained inline
	PHP code that called the Perl script. This in turn generated and
	returned the overlay as a collection of CSS-positioned \verb|<DIV>|
	elements, nested within the top-level \verb|<DIV>| element.

\end{description}

For all of these techniques, the base map image was 1,024 by 520 pixels.
In PNG format it occupied approximately 1.2\,MB (but this version was
never returned to the client), while in JPEG format (Q=90) it occupied
approximately 180\,kB. The base map image was derived from an original
3,599 by 1,826 pixel image, which was part of a collection of maps
released into the public domain by the \citeN{CIA-WFB-2006}. All three
techniques used the PROJ.4 cartographic projections library to convert
latitude/longitude pairs into projected map coordinates, while the first
two techniques also used the GD graphics library to programmatically
generate and manipulate images.

The Google Maps technique was implemented using the data server
distribution style. Once again, a PHP dispatcher page was used. This
time, however, the page included client-side JavaScript code to load and
initialise the Google Maps API, create the base map, and build the map
overlay. The first two steps were achieved using standard Google Maps
API calls. For the last step, the client used an \texttt{XMLHttpRequest}
object to call a server-side Perl script. This script generated and
returned to the client an XML data set containing the points to be
plotted. The client then looped through this data set and used the
Google Maps API calls to create a marker on the base map corresponding
to each point.


\section{Results}
\label{sec-results}

As noted in the introduction, the intent of these experiments was not to
do a full analysis and statistical comparison of the performance of the
different techniques, but rather to identify broad trends. We have not,
therefore, carried out any statistical analysis on the results. We will
now discuss the results for data size, page load time and memory usage.
Because the number of points in each data set increases in powers of
two, we have used log-log scales for all plots.


\subsection{Data size}

During each test run, the data generated by the server was saved to a
file and its size in bytes recorded. In the case of the server-side
image generation and server-side image overlay techniques, the file
comprised a bitmap image; whereas for the server-side HTML overlay and
Google Maps techniques, the file comprised HTML or XML text,
respectively.

There was a certain amount of fixed overhead for each technique tested,
as summarised in Table~\ref{tab-overhead}. This overhead comprised
static files that were always downloaded to the client, regardless of
the number of points to be plotted. Typical fixed overhead included
items such as the base map image, various icons, the PHP source of the
dispatcher page and the JavaScript source for the Google Maps API.


\begin{acmtable}{11cm}
	\centering
	\begin{tabular}{lll}
		Technique						&	Fixed overhead		&	Content	\\
		\hline
		Server-side image generation	&	629\,bytes			&	dispatcher (PHP)\smallskip	\\

		Server-side image overlay		&	\(\approx\) 181\,kB	&	dispatcher (PHP) \\
										&						&	base map image (JPEG)\smallskip	\\

		Server-side HTML overlay		&	\(\approx\) 181\,kB	&	dispatcher (PHP) \\
										&						&	base map image (JPEG)\smallskip	\\

		Google Maps						&	\(\approx\) 235\,kB	&	dispatcher (PHP) \\
										&						&	base map image tiles (PNG) \\
										&						&	API (JavaScript) \\
										&						&	various icons (PNG)	\\
	\end{tabular}
	\caption{Fixed overhead for each technique.}
	\label{tab-overhead}
\end{acmtable}


\begin{figure}
	\centering
	\includegraphics[scale=0.5]{data_size}
	\caption{Comparison of generated data size for each technique (log-log scale).}
	\label{fig-data-size}
\end{figure}


The amount of data generated for each technique, including fixed
overhead, is shown in Figure~\ref{fig-data-size}. It is immediately
apparent from these results that there is a divergence between the two
techniques that generate images (server-side image generation and
server-side image overlay), and the two techniques that generate text
(server-side HTML overlay and Google Maps).

Both the server-side image generation and server-side image overlay
techniques scale particularly well with regard to the amount of data
generated. Interestingly, the amount of data generated by the image
generation technique increases by about 8\,kB up to the 8,192-point data
set, but then \emph{drops} by about 90\,kB over the next three data
sets. This occurs because the number of points plotted has become
sufficient to cover most of the base map. In other words, a large
portion of the composite map image is a single color (see
Figure~\ref{fig-grid-points} on page~\pageref{fig-grid-points} for an
example), which compresses more efficiently.

The amount of data generated by the image overlay technique appears
constant, but actually increases by about 2\,kB across the entire range
of tests. This has important implications for the ability of this
technique to handle multiple layers. Because the overlay images are
quite small (less than 2\,kB for up to one million points), it should be
feasible to pre-load several overlay images into a client-side array and
switch them on and off as desired.

The server-side HTML overlay and Google Maps techniques clearly do not
scale well, and begin to visibly diverge from the other two techniques
once the amount of data generated exceeds about 5\% of the fixed
overhead. For the HTML overlay technique this occurs somewhere between
64 and 128 points, whereas for Google Maps it occurs somewhere between
256 and 512 points. The divergence increases rapidly for both techniques
beyond these points, with the HTML overlay technique suffering the most.
The latter occurs because the HTML overlay technique needs to generate
additional CSS attributes (i.e., more text) in order to correctly
position the \verb|<DIV>| elements, whereas the Google Maps technique
needs only to return a more compact list of latitude/longitude
coordinates.


\subsection{Page load time}

For each test run, we recorded the length of time taken to generate the
data at the server and to display the page in the client browser. The
former is illustrated in Figure~\ref{fig-data-generation-time} and the
latter in Figure~\ref{fig-page-load-time}. The combined time (data
generation + display time) is shown in Figure~\ref{fig-combined-time}.


\subsubsection{Data generation time}


\begin{figure}
	\centering
	\includegraphics[scale=0.5]{data_generation_time}
	\caption{Comparison of data generation time for each technique (log-log scale).}
	\label{fig-data-generation-time}
\end{figure}


The results (see Figure~\ref{fig-data-generation-time}) show that the
length of time taken to generate the source data increases in proportion
to the number of points to be plotted, as expected. It is interesting to
note the differences in data generation time for each technique,
however. Data generation for both of the ``text-based'' techniques (HTML
overlay and Google Maps) is consistently faster than for the
``image-based'' techniques (image generation and image overlay).

The results show that server-side image generation generally takes the
longest to generate its data. This is because it not only has to map
points from latitude/longitude into projected map coordinates, but also
must plot these points onto the base map image, then compress the
composite image as a JPEG. The image to be compressed is also moderately
complex, which only adds to the data generation time. Server-side image
overlay performs somewhat better because it uses a less complex
compression method (PNG) and the image to be compressed is much simpler
(a collection of colored points on a blank background).

The server-side HTML overlay technique appears faster at generating data
than either of the two image-based techniques at the low end, but is
similar in performance at the high end. In this technique the server
only needs to map latitude/longitude to projected map coordinates; no
images need to be generated and there is no compression to deal with. At
the high end, however, this advantage is clearly offset by the
significant volume of data being generated. Google Maps is faster again,
because almost all processing is carried out on the client; the server's
only involvement is to generate a simple list of latitude/longitude
coordinates.

In terms of data generation, it appears that all techniques tested scale
reasonably well. The image-based techniques perform worse at the low end
because they involve more complex processing than the text-based
techniques, but this is offset at the high end by the relatively
constant amount of data generated. Conversely, the text-based techniques
perform better at the low end, but are negatively impacted at the high
end by the sheer volume of data produced (tens or hundreds of megabytes
vs.\ hundreds of kilobytes).


\subsubsection{Map display time}


\begin{figure}
	\centering
	\includegraphics[scale=0.5]{page_load_time}
	\caption{Comparison of map display time for each technique (log-log scale).}
	\label{fig-page-load-time}
\end{figure}


These results (see Figure~\ref{fig-page-load-time}) reveal quite a
spectacular difference between the image-based and text-based
techniques. The time taken to display the map is essentially constant
for both of the image-based techniques, regardless of the number of
points to be plotted. This is not surprising given that the size of the
generated data is also essentially constant, and that the browser is
simply loading and displaying static images. The image overlay technique
appears slightly slower than the image generation technique. This is
probably because the image overlay technique has to load two images from
the server (the base map and the overlay), compared to one image for the
image generation technique.

In contrast, the text-based technique clearly do not scale well with
regards to map display time. Google Maps suffers particularly in this
regard, with display time exceeding ten seconds shortly past 512 points.
Testing was abandoned at 4,096 points, with a single test run taking
over seven minutes. The HTML overlay technique fares better, exceeding
ten seconds somewhere between 4,096 and 8,192 points. Testing was
abandoned at 32,768 points, with a single test run taking almost ten
minutes.


\subsubsection{Combined time}


\begin{figure}
	\centering
	\includegraphics[scale=0.5]{combined_time}
	\caption{Comparison of combined page load time for each technique (log-log scale).}
	\label{fig-combined-time}
\end{figure}


Combining the data generation and map display times (see
Figure~\ref{fig-combined-time}) yields little change in the curves for
the text-based techniques, because the data generation times are very
small compared to the map display times. There is a more obvious impact
on the image-based techniques, with both techniques remaining more or
less constant up to about 2,048 points, then slowing as the number of
points increases beyond that. However, the slowdown is nowhere near as
dramatic as for the text-based techniques; even the largest data set
only takes about nineteen seconds overall. The image overlay technique
does display a slight advantage of about half a second over the image
generation technique for the largest data set, but further experiments
will be required to determine whether this is a statistically
significant difference.


\subsection{Memory usage}

We measured both the real and virtual memory usage of the browser by
running the \texttt{top} utility after each test run and observing the
memory usage in each category. This told us the size of both the current
``working set'' and the total memory footprint of the browser process
after it had completed a test run. The real memory results are shown in
Figure~\ref{fig-real-memory} and the virtual memory results are shown in 
Figure~\ref{fig-virtual-memory}.


\begin{figure}
	\centering
	\includegraphics[scale=0.5]{real_memory}
	\caption{Comparison of real memory usage for each technique (log-log scale).}
	\label{fig-real-memory}
\end{figure}


\begin{figure}
	\centering
	\includegraphics[scale=0.5]{virtual_memory}
	\caption{Comparison of virtual memory usage for each technique (log-log scale).}
	\label{fig-virtual-memory}
\end{figure}


While both sets of results display similar trends, the real memory data
proved somewhat problematic. Real memory usage was generally consistent
across test runs, but would also frequently fluctuate upwards by a
factor of nearly two for no readily apparent reason. This is
particularly apparent with the HTML overlay technique beyond 1,024
points. We can only assume that this was a result of other processes on
the test machine interacting with the browser process in unexpected
ways. We are therefore somewhat wary of the real memory data, but they
are at least broadly consistent with the virtual memory data. The
virtual memory data proved more consistent overall, as the virtual
memory footprint of a process is less likely to be impacted by other
running processes.

The results show that the two image-based techniques have essentially
constant memory usage regardless of the number of points plotted. This
is to be expected, given that the size of the source data is also
essentially constant. The text-based techniques, however, clearly begin
to diverge as the number of points increases. The HTML overlay technique
starts to visibly diverge somewhere between 2,048 and 4,096 points,
while Google Maps starts to visibly diverge 64 and 128 points. This is
in line with our expectation for these techniques that memory usage
would increase in proportion to the number of points. It is intriguing
to note that for both techniques, there appears little consistency as to
where the performance of each measure begins to diverge, as shown in
Table~\ref{tab-divergence} (although Google Maps appears to exhibit
greater consistency than HTML overlay in this regard).


\begin{acmtable}{11cm}
	\centering
	\begin{tabular}{lccc}
		Technique						&	Data size	&	Map display time	&	Virtual memory	\\
		\hline
		Server-side HTML overlay		&	64--128		&	128--256			&	2,048--4,096 \\
		Google Maps						&	256--512	&	64--128				&	64--128	\\
	\end{tabular}
	\caption{Approximate number of points at which each measure begins to diverge,
		for the HTML overlay and Google Maps techniques.}
	\label{tab-divergence}
\end{acmtable}


\section{Conclusion and future work}
\label{sec-conclusion}

In this research, we tested the scalability of four techniques for
online geovisualization of web site hits, with respect to the number of
points to be plotted on the map. The four techniques tested were
server-side image generation, server-side image overlay, server-side
HTML overlay and Google Maps. The results clearly show that the
server-side image generation and server-side image overlay techniques
scale the best from small to large data sets. The HTML overlay and
Google Maps techniques work well for small data sets, but their
performance rapidly deteriorates as the size of the data set increases,
to the point where they become unusable.

Despite this clear difference in scalability, we are still left with
some interesting questions. We did not investigate the model interaction
environment distribution style in this research, as it was unclear
whether this could be achieved using only client-side JavaScript. This
is clearly an avenue for further investigation. In addition, the
appearance of native SVG support in some browsers means that this may
also become a viable option in future.

We were somewhat surprised that the server-side HTML overlay and Google
Maps techniques exhibited no obvious consistency in where the different
measures (data size, map display time and virtual memory usage)
diverged. It seems logical that some form of correlation might exist, so
further research will be required to investigate this. One possibility
might be to implement an instrumented web browser and server in order to
gather more precise data.

Shortly after completing our experiments, we discovered \emph{msCross
Webgis}\footnote{\url{http://datacrossing.crs4.it/en_Documentation_mscross.html}},
which is an open source Google Maps clone. Its documentation implies
that it may be possible to build a fully self-contained implementation
that requires no external network access. This would enable us to test
on an isolated network with the client and server running on different
machines. We could then include measurements of network transfer time,
and eliminate any problems caused by running the client and server on
the same machine.

Our overall aim was to identify which was the best technique to use to
plot downloads and abstract views from the Otago School of Business
digital repository. Based on our results, both the server-side HTML
overlay and Google Maps techniques are clearly inappropriate for this
task. This leaves us with a choice between two very similarly-performing
techniques: server-side image generation and server-side image overlay.
However, the practical advantages of multi-layer techniques over
single-layer techniques, such as the ability to dynamically show and
hide multiple overlays, mean that server-side image overlay is the clear
winner in this case.


\begin{acks}
The author would like to acknowledge Dr.\ Antoni Moore and Prof.\ George
Benwell for their input into this research.
\end{acks}


\bibliography{Map_Visualisation}


\begin{received}
...
\end{received}
\end{document}



            \documentclass[acmtocl,acmnow]{acmtrans2m}


\usepackage{graphicx}


\newtheorem{theorem}{Theorem}[section]
\newtheorem{conjecture}[theorem]{Conjecture}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{lemma}[theorem]{Lemma}
\newdef{definition}[theorem]{Definition}
\newdef{remark}[theorem]{Remark}


           
\markboth{Nigel Stanger}{...}

\title{Scalability of Techniques for Online Geovisualization of Web Site Hits}
            
\author{NIGEL STANGER \\ University of Otago}
            
\begin{abstract} 
A useful approach to visualising the geographical distribution of web
site hits is to geolocate the IP addresses of hits and plot them on a
world map. This can be achieved by dynamic generation and display of map
images at the server and/or the client. In this paper we compare the
scalability with respect to source data size of four techniques for
dynamic map generation and display: generating a single composite map
image, overlaying transparent images on an underlying base map,
overlaying CSS-enabled HTML on an underlying base map and generating a
map using Google Maps. These four techniques embody a mixture of
different display technologies and distribution styles. The results show
that all four techniques are suitable for small data sets, but that the
latter two techniques scale poorly to larger data sets.
\end{abstract}
            
\category{C.4}{Performance of Systems}{Performance attributes}
\category{C.2.4}{Computer-Communication Networks}{Distributed Systems}[distributed applications]
\category{H.3.5}{Information Storage and Retrieval}{Online Information Services}[web-based services]
            
\terms{Experimentation, Measurement, Performance} 
            
\keywords{geolocation, geovisualization, scalability, GD, Google Maps}
            
\begin{document}


\bibliographystyle{acmtrans}

            
\begin{bottomstuff} 
Author's address: N. Stanger, Department of Information Science,
University of Otago, PO Box 56, Dunedin 9054, New Zealand.
\end{bottomstuff}
            
\maketitle


\section{Introduction}
\label{sec-introduction}

When administering a web site, it is quite reasonable to want
information on the nature of traffic to the site. Information on the
geographic sources of traffic can be particularly useful in the right
context. For example, an e-commerce site might wish to determine the
geographical distribution of visitors to its site, so that it can decide
where best to target its marketing resources. One approach to doing so
is to plot the geographical location of web site hits on a map.
Geographical information systems (GIS) were already being used for these
kinds of purposes prior to the advent of the World Wide Web
\cite{Beau-JR-1991-GIS}, and it is a natural extension to apply these
ideas to online visualization of web site hits.

Our interest in this area derives from implementing a pilot digital
institutional repository at the University of
Otago\footnote{\url{http://eprints.otago.ac.nz/}} in November 2005
\cite{Stan-N-2006-running}, using the GNU
EPrints\footnote{\url{http://www.eprints.org/}} repository management
software. This repository quickly attracted interest from around the
world and the number of abstract views and document downloads began to
steadily increase. We were obviously very interested in tracking this
increase, particularly with respect to where in the world the hits were
coming from. The EPrints statistics management software developed at the
University of Tasmania \cite{Sale-A-2006-stats} proved very useful in
this regard, providing us with detailed per-eprint and per-country
download statistics; an example of the latter is shown in
Figure~\ref{fig-tas-stats}. However, while this display provides an
ordered ranking of the number of hits from each country, it does not
provide any greater detail than to the country level, nor does it
provide any visual clues as to the distribution of hit sources around
the globe.


\begin{figure}
	\centering
	\includegraphics[scale=0.65]{tasmania_stats}
	\caption{A portion of the by-country display for the Otago EPrints
	repository, generated by the Tasmania statistics software.}
	\label{fig-tas-stats}
\end{figure}


We therefore began to explore possible techniques for plotting our
repository hit data onto a world map, with the aim of adding this
capability to the Tasmania statistics package. Our preference was for a
technique that could be used within a modern web browser without the
need to manually install additional client software, so as to make the
new feature available to the widest possible audience and reduce the
impact of wide variation in client hardware and software environments
\cite[pp.\ 27--28]{Offu-J-2002-quality}.

There have been several prior efforts to geovisualize web activity.
\citeN{Lamm-SE-1996-webvis} developed a sophisticated system for
real-time visualization of web traffic on a 3D globe, but this was
intended for use within a virtual reality environment, thus limiting its
general applicability. \citeN{Papa-N-1998-Palantir} described a similar
system (Palantir), which was written as a Java applet and thus able to
be run within a web browser, assuming that a Java virtual machine was
available. \citeN[pp.\ 100--103]{Dodg-M-2001-cybermap} describe these
and several other related systems for mapping Web and Internet traffic.

These early systems suffered from a distinct limitation in that there
was no public infrastructure in place for geolocating IP addresses (that
is, translating them into latitude/longitude coordinates). They
generally used \texttt{whois} lookups or parsed the domain name in an
attempt to guess the country of origin, with fairly crude results
\cite{Lamm-SE-1996-webvis}. Locations outside the United States were
typically aggregated by country and mapped to the capital city
\cite{Lamm-SE-1996-webvis,Papa-N-1998-Palantir,Jian-B-2000-cybermap}.
Reasonably accurate and detailed databases were commercially available
at the time \cite[p.\ 1466]{Lamm-SE-1996-webvis}, but were not generally
available to the public at large, thus limiting their utility.

The situation has improved considerably in the last five years, however,
with the advent of freely available and reasonably accurate geolocation
services\footnote{Such as \url{http://www.maxmind.com/} or
\url{http://www.ip2location.com/}.} with worldwide coverage and
city-level resolution. For example, Maxmind's \emph{GeoLite City}
database is freely available and claims to provide ``60\% accuracy on a
city level for the US within a 25 mile radius''
\cite{Maxm-G-2006-GeoLiteCity}. Their commercial \emph{GeoIP City}
database claims 80\% accuracy for the same parameters.

The techniques used by these systems can generally be divided into two
classes. The first class of techniques generate a single bitmap image
that contains both the map and the icons representing web hits. This can
be achieved by programmatically plotting points onto a base map image;
the composite image is then displayed at the client. We shall henceforth
refer to this class of techniques as \emph{single layer} techniques.
The second class of techniques separately return both a base map image
and some kind of overlay containing the plotted points. The overlay is
then combined with the base map at the client. We shall henceforth refer
to this class of techniques as \emph{multi-layer} techniques.

Both classes of techniques have been used in the aforementioned systems,
but multi-layer techniques appear to have been particularly popular. For
example, Palantir used a multi-layer technique, where a Java applet running
at the client overlaid graphic elements onto a base map image retrieved
from the now-defunct Xerox online map server
\cite{Papa-N-1998-Palantir}. A more recent example is the Google Maps
API \cite{Goog-M-2006-maps}, which enables web developers to easily
embed dynamic, interactive maps within web pages. Google Maps is a
dynamic multi-layer technique that has only become feasible relatively
recently with the advent of widespread support for CSS positioning and
Ajax technologies in many browsers.

Multi-layer techniques enjoy a particular advantage over single layer
techniques, in that they provide the potential for a more flexible
GIS-like interaction with the map, with multiple layers that can be
activated and deactivated as desired. This flexibility could explain why
such techniques appear more prevalent in the literature. However,
multi-layer techniques tend to rely on more recent web technologies such as
CSS2 and Ajax, whereas single layer techniques generally do not. Single
layer techniques should therefore be portable to a wider range of client
and server environments.

Each technique comprises a specific technology or collection of
technologies (such as transparent bitmap overlays), implemented using a
specific distribution style. For example, one single layer technique
might be implemented completely server-side while another might use a
mixture of server-side and client-side processing. Similarly, multi-layer
techniques may adopt different distribution styles, and the overlays
themselves might take the form of transparent images, absolutely
positioned HTML elements, dynamically generated graphics, etc.

Given the many possible techniques that were available, the next
question was which techniques would be most suitable for our purposes?
Scalability is a key issue for web applications in general \cite[p.\
28]{Offu-J-2002-quality}, and online activity visualization in
particular \cite[p.\ 50]{Eick-SG-2001-sitevis}, so we were particularly
interested in techniques that could scale to a large number of points.
For example, at the time of writing the Otago EPrints repository had
been accessed from over 10,000 distinct IP addresses, each potentially
representing a distinct geographical location. Separating out the type
of hit (abstract view versus document download) increased that figure to
nearly 13,000.

We first narrowed down the range of techniques to just four (server-side
image generation, server-side image overlay, server-side HTML overlay
and Google Maps); the selection process and details of the techniques
chosen are discussed in Section~\ref{sec-techniques}. We then set about
testing the scalability of these four techniques, in order to determine
how well each technique handled large numbers of points. A series of
experiments was conducted on each technique with progressively larger
data sets, and the elapsed time and memory usage were measured. The
experimental design is discussed in Section~\ref{sec-experiment}.

Our initial intuition was that server-side image generation and
server-side image overlay techniques would scale best, and this was
borne out by the results of the experiments, which show that both
techniques scale reasonably well to very large numbers of points. The
other two techniques proved to be reasonable for relatively small
numbers of points (generally less than about 500--1,000), but their
performance deteriorated rapidly beyond this. The results are discussed
in more detail in Section~\ref{sec-results}.

It should be noted that the intent of the experiments was not to
identify statistically significant differences between techniques. It
was expected that variations across techniques would be obvious, and the
experiments were designed to test this expectation. However, the two
best performing techniques, server-side image generation and server-side
image overlay, produced very similar results, so a more formal
statistical analysis of these techniques may be warranted. This and
other possible future directions are discussed in
Section~\ref{sec-future}.


\section{Technique selection}
\label{sec-techniques}

In this section we discuss in more detail the four techniques that we
chose for testing, and how we decided upon these particular techniques.
First, we discuss the impact of distribution style on the choice of
technique. Then, for each of the four chosen techniques, we examine how
the technique works in practice, its implementation requirements, its
relative advantages and disadvantages, and any other issues peculiar to
the technique.


\subsection{Distribution style}
\label{sec-distribution}

\citeN{Wood-J-1996-vis} and \citeN{MacE-AM-1998-GIS} identified four
distribution styles for web-based geographic visualization software. The
\emph{data server} style is where the server only supplies raw data, and
all manipulation, display and analysis takes place at the client. In
other words, this is primarily a client-side processing model, as
illustrated in Figure~\ref{fig-distribution-styles}(a). For example,
Palantir implemented a multi-layer technique using this distribution style
\cite{Papa-N-1998-Palantir}, where the source data were generated at the
server and the map was generated, displayed and manipulated by a Java
applet running at the client. The data server distribution style can
provide a very dynamic and interactive environment to the end user, but
clearly requires support for executing application code within the web
browser, typically using something like JavaScript, Java applets or
Flash. JavaScript is now tightly integrated into most browsers, but the
same cannot be said for either Java or Flash. That is, we cannot
necessarily guarantee the existence of a Java virtual machine or Flash
plugin in every browser, which violates our requirement to avoid manual
installation of additional client-side software. We can therefore
eliminate Java- or Flash-based data server techniques from
consideration, but JavaScript-based data server techniques may still be
feasible.


\begin{figure}
	\centering
	\begin{tabular}{ccc}
		\includegraphics[scale=1]{data_server}	&
		\qquad	&
		\includegraphics[scale=1]{image_server}	\\
		\footnotesize (a) Data server	&
		\qquad	&
		\footnotesize (b) Image server	\\
		\\
		\\
		\includegraphics[scale=1]{model_interaction}	&
		\qquad	&
		\includegraphics[scale=1]{shared}	\\
		\footnotesize (c) Model interaction environment	&
		\qquad	&
		\footnotesize (d) Shared environment	\\
	\end{tabular}
	\caption{Distribution styles for web-based geographic visualization
	\protect\cite{Wood-J-1996-vis}. (F = filtering, M = mapping, R =
	rendering.)}
	\label{fig-distribution-styles}
\end{figure}


In contrast, the \emph{image server} style is where the display is
created entirely at the server and is only viewed at the client. In
other words, this is primarily a server-side processing model, as
illustrated in Figure~\ref{fig-distribution-styles}(b). Consequently,
techniques that use this style require no additional client-side
software, and thus meet our requirements. The downside is that the
resultant visualization can tend to be very static and non-interactive
in nature, as it is just a simple bitmap image.

The \emph{model interaction environment} style is where a model created
at the server can be explored at the client, as illustrated in
Figure~\ref{fig-distribution-styles}(c). \citeN{Wood-J-1996-vis}
originally referred to this as the ``3D model interaction'' style, but
this seems slightly out of place in the current context. They originally
intended this distribution style to apply to VRML models for GIS
applications, but it could be equally applied to any situation where an
interactive model is generated at the server, then downloaded to and
manipulated at the client. This is very similar to what happens with
many Flash-based applications, for example. ``Model interaction
environment'' therefore seems a more appropriate name for this style.
The key distinguishing feature of this style is that there is no further
interaction between the client and server after the model has been
downloaded. This means that while the downloaded model can be very
dynamic and interactive, changing the underlying data requires a new
model to be generated at the server and downloaded to the client.
Similar restrictions apply to techniques using this style as to the
data server style, so Java- and Flash-based model interaction
environment techniques can be eliminated from consideration. For similar
reasons, we can also eliminate solutions that require browser plugins
such as VRML or SVG (although native support for the latter is beginning
to appear in some browsers). It may be possible to implement this
distribution style using only client-side JavaScript, but it is presently
unclear as to how effective this might be.

% future work: implement model interaction using JavaScript?

Finally, the \emph{shared environment} style is where data manipulation
is done at the server, but control of that manipulation, rendering, and
display all occur at the client, as illustrated in
Figure~\ref{fig-distribution-styles}(d). This is similar to the model
interaction environment style, but with the addition of a feedback loop
from the client to the server, thus enabling a more flexible and dynamic
interaction. Ajax technologies \cite{Garr-JJ-2005-Ajax} can easily
support this kind of distribution style. For example,
\citeN{Saya-A-2006-GISWS} discuss the use of Ajax to integrate Google
Maps with existing GIS visualization web services. We can eliminate
shared environment techniques from consideration based on the same
criteria as were applied to the other three styles.


\subsection{Single layer techniques}
\label{sec-image-gen}

As noted earlier, single layer techniques work by directly plotting
geolocated IP addresses onto a base map image, then displaying the
composite image at the client. A typical example of the kind of output
that might be produced is shown in Figure~\ref{fig-image}. Such
techniques require two specific components: software to programmatically
create and manipulate bitmap images (for example, the GD image
library\footnote{\url{http://www.boutell.com/gd/}}); and software to
transform raw latitude/longitude coordinates into projected map
coordinates on the base map (for example, the PROJ.4 cartographic
projections library\footnote{\url{http://www.remotesensing.org/proj/}}).


\begin{figure}
	\centering
	\includegraphics[width=0.95\textwidth,keepaspectratio]{ImageGeneration-full}
	\caption{Sample output from the server-side image generation technique.}
	\label{fig-image}
\end{figure}


Single layer techniques could use any of the distribution styles
discussed in Section~\ref{sec-distribution}. However, all but the image
server style would require the installation of additional client-side
software for generating images and performing cartographic projection
operations, so we will only consider single layer using the image
server distribution style (or \textbf{server-side image generation})
from this point on.

The server-side image generation technique provides some distinct
advantages. It is relatively simple to implement and is fast at
producing the final image, mainly because it uses existing,
well-established technologies. It is also bandwidth efficient: the size
of the generated map image is determined by the total number of pixels
and the compression method used, rather than by the number of points to
be plotted. The amount of data to be sent to the client should therefore
remain more or less constant, regardless of the number of points
plotted.

This technique also has some disadvantages, however. First, a suitable
base map image must be acquired. This could be generated from a GIS, but
if this is not an option an appropriate image must be obtained from a
third party. Care must be taken in the latter case to avoid potential
copyright issues. Second, the compression method used to produce the
final composite map image can have a significant impact on visual
quality. For example, lossy compression methods such as JPEG can make
the points plotted on the map appear distinctly fuzzy or ``muddy'', as
shown in Figure~\ref{fig-image-quality}. A lossless compression method
such as PNG will avoid this problem, but will tend to produce larger
image files. Finally, it is harder to provide interactive map
manipulation features with this technique, as the output is a simple
static image. Anything that changes the content of the map (such as
panning or changing the visibility of points) will require the entire
image to be regenerated. Zooming could be achieved if a very high
resolution base map image was available, but the number of possible zoom
levels might be restricted.


\begin{figure}
	\centering
	\includegraphics[scale=1.25]{jpeg_detail}\medskip
	
	\includegraphics[scale=1.25]{overlay_detail}
	\caption{Image quality of JPEG (Q=90) image generation (top) vs.\
	PNG image overlay (bottom).}
	\label{fig-image-quality}
\end{figure}


\subsection{Multi-layer techniques}
\label{sec-overlay}

% Look for publications regarding the DataCrossing Ajax client.
% See <http://datacrossing.crs4.it/en_Documentation_Overlay_Example.html>.
% They use <IMG> rather than <DIV>, which has the advantage of the image
% being loaded only once, but makes it harder to dynamically change the
% appearance of markers. The amount of data generated will still be
% proportional to the number of points (one <IMG> per point).

Multi-layer techniques also involve plotting points onto a base map image,
but they differ from single layer techniques in that the points are
not composited directly onto the base map image. Rather, the points are
displayed as an independent overlay on top of the base map image. This
provides a significant advantage over single layer techniques, as it
enables the possibility of multiple independent layers that can be
individually shown or hidden. This is very similar to the multi-layer
functionality provide by GIS, and is an effective way to provided
interactive visualizations of geographic data
\cite{Wood-J-1996-vis,MacE-AM-1998-GIS}. We still have the problem of
finding a suitable base map image, however.

Until relatively recently, implementing multi-layer techniques would likely
have required additional software at the client, but most modern
browsers now support absolute positioning of elements using CSS. This
enables us to create a map overlay using nothing more than HTML, CSS and
a few bitmap images. We have identified two main alternatives for
producing such an overlay, which we have termed \emph{image overlay} and
\emph{HTML overlay}.

An image overlay comprises a transparent bitmap image into which the
points are plotted, which is then overlaid on the base map image (in our
implementation, the output looks essentially identical to that shown in
Figure~\ref{fig-image} on page~\pageref{fig-image}). This requires the
overlay image to be in either PNG or GIF format, as JPEG does not
support transparency. Fortunately the overlay image is likely to contain
a lot of ``white space'', which compresses very well, so use of a
lossless compression method should not be an issue. This also eliminates
the ``fuzziness'' issue noted earlier (see
Figure~\ref{fig-image-quality}). The size of the image overlay will
generally be proportional to the number of points to be plotted, but the
image compression should have a moderating effect on this.

As noted earlier, generating images at the client would require
additional software to be installed, so we will only consider the data
server distribution style for image overlays (or \textbf{server-side
image overlay}). That is, both the base map image and the overlay(s) are
generated at the server.

An HTML overlay comprises a collection of HTML elements corresponding to
the points to be plotted, which are positioned over the base map image
using CSS absolute positioning. There is considerable flexibility as to
the types of elements that could be used to construct the overlay. One
possibility is to use \verb|<IMG>| elements to place icons on the base
map; this appears to be the approach adopted by Google Maps (see
Figure~\ref{fig-google}). Another possibility is to use appropriately
sized and colored \verb|<DIV>| elements, which then appear as colored
blocks ``floating'' over the base map image (in our implementation, the
output looks essentially identical to that shown in
Figure~\ref{fig-image} on page~\pageref{fig-image}).


\begin{figure}
	\centering
	\includegraphics[width=0.95\textwidth,keepaspectratio]{GoogleMap-full.png}
	\caption{Sample output from the Google Maps technique.}
	\label{fig-google}
\end{figure}


HTML overlays may be generated at either the server or the client.
Unlike the techniques discussed previously, however, HTML overlays can
be generated at the client without the need for additional software,
because only HTML (i.e., text) is being generated, not images. This can
be easily achieved using client-side JavaScript, so HTML overlays can
use any of the distribution styles discussed in
Section~\ref{sec-distribution} without violating our requirements. We
have therefore adopted two representative multi-layer techniques for our
experiments: \textbf{server-side HTML overlays} (using the image server
distribution style) and \textbf{Google Maps} (using the data server
distribution style). Since Google Maps uses \verb|<IMG>| elements, we
have used \verb|<DIV>| elements for the server-side HTML overlay.

Server-side HTML overlays are actually slightly simpler to implement
than either server-side image generation or image overlays, because we
do not need to write any code to generate or manipulate images (the base
map image is static and thus requires no additional processing). All
that is required is code to transform latitude/longitude coordinates
into projected map coordinates and produce corresponding \verb|<DIV>|
elements.

Google Maps \cite{Goog-M-2006-maps} is a more complex proposition. This
technique uses the data server distribution style, where JavaScript code
running within the browser enables the client to manipulate the base map
and its overlays. Data and map images are requested asynchronously from
the server as required, using Ajax technologies, which seems to imply
that Google Maps in fact uses the shared environment distribution style.
However, the server has no involvement beyond simply supplying data to
the client. In the shared environment distribution style, the server is
directly involved in manipulating the map, under the control of the
client. This is clearly not the case with Google Maps.

The primary advantage of Google Maps is the powerful functionality it
provides for generating and interacting with the map. Users may pan the
map in any direction and zoom in and out to many different levels. A
satellite imagery view is also available. In addition, further
information about each point plotted (such as the name of the city, for
example) can be displayed in a callout attached to the point, as shown
in Figure~\ref{fig-google}.

However, there are also some significant disadvantages to the Google
Maps technique\footnote{Interestingly, the Google Earth application
addresses many of these issues, but since it is not a browser-based
solution it falls outside the scope of our consideration. However, for
interest's sake we did an informal comparison between Google Earth and
the four techniques that we have tested, and this has been included in
the results in Section~\ref{sec-results}.}. First, it is a distributed
application, thus making it more complex to implement, test and debug
\cite{Bates-PC-1995-distdebug,Ensl-PH-1978-distributed}. Second, the
server must have a registered API key from Google, which is verified
every time that a page attempts to use the API. Similarly, the client
must connect to Google's servers in order to to download the API's
JavaScript source. This means that the technique must have an active
Internet connection in order to work. Finally, the Google Maps API does
not currently provide any way to toggle the visibility of markers on the
map, so it is not possible to implement the interactive ``layers''
mentioned at the start of this section. (It is possible, of course, that
Google will implement this feature in a later version of the API.)

The most significant disadvantage of all HTML overlay techniques,
however, is that the size of the HTML overlay is directly proportional
to the number of points to be plotted. There will be one overlay element
(\verb|<DIV>| or \verb|<IMG>|) per point, so a very large number of
points will result in an even larger amount of HTML source being
generated. We expect that this will lead to excessive browser memory
usage, and consequently that these techniques will not scale well at the
high end. However, they may still be useful for smaller data sets that
require interactive manipulation.


\section{Experimental design}
\label{sec-experiment}

After some preliminary testing with live data from the Otago School of
Business repository, we proceeded with a series of experiments to test
the scalability of the four techniques. Each technique was tested using
progressively larger synthetic data sets. The first data set comprised
one point at the South Pole (latitude \(-90^{\circ}\), longitude
\(-180^{\circ}\)). Each successive data set was twice the size of its
predecessor, building up a regular grid of latitude/longitude points at
one degree intervals\footnote{The entire grid has 64,800 points, so the
five largest data sets have many duplicate points.}. A total of
twenty-one data sets were created in this way, with the number of points
ranging from one to 1,048,576 (\(=2^{20}\)). The result of plotting the
16,384-point data set is shown in Figure~\ref{fig-grid-points}.


\begin{figure}
	\centering
	\includegraphics[width=0.95\textwidth,keepaspectratio]{16384_points}
	\caption{The 16,384-point data set plotted on the base map.}
	\label{fig-grid-points}
\end{figure}


The focus on scalability meant that we were primarily interested in
measuring page load times, memory usage and the amount of data
generated (which impacts on both storage and network bandwidth). Page
load time can be further broken down into the time taken to generate the
map data, the time taken to transfer the map data to the client across
the network, and the time taken by the client to display the map.

Unfortunately, the Google Maps technique requires an active Internet
connection (as noted in Section~\ref{sec-overlay}), so we were unable to
run the experiments on an isolated network. This meant that traffic on
the local network was a potential confounding factor. We therefore
decided to eliminate network performance from the equation by running
both the server and the client on the same machine\footnote{A Power
Macintosh G5 1.8\,GHz with 1\,GB RAM, running Mac OS X 10.4.7, Apache
2.0.55, PHP 4.4 and Perl 5.8.6.}. This in turn enabled us to measure the
time taken for data generation and page display independently, thus
simplifying the process of data collection and also ensuring that the
client and server processes did not unduly interfere with each other,
despite running on the same machine.

It could be argued that network performance would still have a
confounding effect on the Google Maps technique, but this would only be
likely for the initial download of the API (comprising about 235\,kB of
JavaScript source and images), which would be locally cached thereafter.
The API key verification does occur every time the map is loaded, but
the amount of data involved is very small, so it is less likely that
this would be significantly affected by network performance. Any such
effect would also be immediately obvious as it would simply block the
server from proceeding.

For each data set generated, we recorded its size, the time taken to
generate it, the time taken to display the resultant map in the browser,
and the amount of real and virtual memory used during the test by the
browser. We also intended to measure the memory usage of the server, but
this proved more difficult to isolate than expected, and was thus
dropped from the experiments. The data set generation time and browser
memory usage were measured using the \texttt{time} and \texttt{top}
utilities respectively (the latter was run after each test run to avoid
interference). The map display time was measured using the ``page load
test'' debugging feature of Apple's Safari web browser, which can
repetitively load a set of pages while recording various statistics, in
particular the time taken to load the page. Tests were run up to twenty
times each where feasible, in order to reduce the impact of random
variations. Some tests were run fewer times because they took a very
long time (several minutes for a single test run). We typically broke
off further testing when a single test run took longer than about five
minutes, as by this stage performance had already deteriorated well
beyond usable levels.

While it is somewhat beyond the scope of this work, out of interest some
informal tests were also undertaken using the Google Earth application.
A Perl script was used to generate a collection of KML files
corresponding to the data sets described above. Each data set was then
loaded into Google Earth, and a stopwatch was used to measure how long
it took to load the data set, defined as the period during which the
dialog box ``\textsf{Loading myplaces.kml, including enabled overlays}''
was displayed on screen.


\subsection{Technique implementation}

As noted in Sections~\ref{sec-image-gen} and \ref{sec-overlay}, the
server-side image generation, server-side image overlay and server-side
HTML overlay techniques were all implemented using the image server
distribution style. A separate dispatcher page was written in PHP for
each technique, which enabled arguments---such as the number of points
to be plotted---to be passed from the client to a corresponding Perl
script for each technique. The final page was then constructed as
follows:
\begin{description}

	\item[server-side image generation] The dispatcher page included a
	standard \verb|<IMG>| element that called the Perl script. This
	script then loaded a base map PNG image, plotted points directly
	onto it, and returned the
	composite map to the client as a JPEG image (with the ``quality''
	parameter set to 90).

	\item[server-side image overlay] The dispatcher page included two
	\verb|<IMG>| elements, the first for the base map and the second for
	the overlay, both with identical CSS positioning attributes. The
	first \verb|<IMG>| simply loaded a static JPEG image representing
	the base map. The second \verb|<IMG>| called the Perl script, which
	generated and returned the overlay as a transparent PNG image.

	\item[server-side HTML overlay] The dispatcher page included a
	\verb|<IMG>| element for the base map and a \verb|<DIV>| element for
	the overlay, both with identical CSS positioning attributes. As with
	the previous technique, the \verb|<IMG>| simply loaded a static JPEG
	image representing the base map. The \verb|<DIV>| contained inline
	PHP code that called the Perl script. This in turn generated and
	returned the overlay as a collection of CSS-positioned \verb|<DIV>|
	elements, nested within the top-level \verb|<DIV>| element.

\end{description}

For all of these techniques, the base map image was 1,024 by 520 pixels.
In PNG format it occupied approximately 1.2\,MB (but this version was
never returned to the client), while in JPEG format (Q=90) it occupied
approximately 180\,kB. The base map image was derived from an original
3,599 by 1,826 pixel image, which was part of a collection of maps
released into the public domain by the \citeN{CIA-WFB-2006}. All three
techniques used the PROJ.4 cartographic projections library to convert
latitude/longitude pairs into projected map coordinates, while the first
two techniques also used the GD graphics library to programmatically
generate and manipulate images.

The Google Maps technique was implemented using the data server
distribution style. Once again, a PHP dispatcher page was used. This
time, however, the page included client-side JavaScript code to load and
initialise the Google Maps API, create the base map, and build the map
overlay. The first two steps were achieved using standard Google Maps
API calls. For the last step, the client used an \texttt{XMLHttpRequest}
object to call a server-side Perl script. This script generated and
returned to the client an XML data set containing the points to be
plotted. The client then looped through this data set and used the
Google Maps API calls to create a marker on the base map corresponding
to each point.


\section{Results}
\label{sec-results}

As noted in the introduction, the intent of these experiments was not to
do a full analysis and statistical comparison of the performance of the
different techniques, but rather to identify broad trends. We have not,
therefore, carried out any statistical analysis on the results. We will
now discuss the results for data size, page load time and memory usage.
Because the data set size increases by powers of two, we have used
log-log scales for all plots.


\subsection{Data size}

During each test run, the data generated by the server was saved to a
file and its size in bytes recorded. In the case of the server-side
image generation and server-side image overlay techniques, the file
comprised a bitmap image; whereas for the server-side HTML overlay and
Google Maps techniques, the file comprised XML data. (The latter was
also true of the KML files generated for use with Google Earth.)

There was a certain amount of fixed overhead for each technique tested,
as summarised in Table~\ref{tab-overhead}. This overhead comprised
static files that were always downloaded to the client, regardless of
the number of points to be plotted. Typical fixed overhead included
items such as the base map image, various icons, the PHP source of the
dispatcher page and the JavaScript source for the Google Maps API.


\begin{acmtable}{11cm}
	\centering
	\begin{tabular}{lll}
		Technique						&	Fixed overhead		&	Content	\\
		\hline
		Server-side image generation	&	629\,bytes			&	PHP (dispatcher)\smallskip	\\

		Server-side image overlay		&	\(\approx\) 181\,kB	&	PHP (dispatcher) \\
										&						&	base map image (JPEG)\smallskip	\\

		Server-side HTML overlay		&	\(\approx\) 181\,kB	&	PHP (dispatcher) \\
										&						&	base map image (JPEG)\smallskip	\\

		Google Maps						&	\(\approx\) 235\,kB	&	PHP (dispatcher) \\
										&						&	base map tiles \\
										&						&	JavaScript (API) \\
										&						&	various icons\smallskip	\\

		(Google Earth)					&	unknown				&	\\
	\end{tabular}
	\caption{Fixed overhead for each technique.}
	\label{tab-overhead}
\end{acmtable}


\begin{figure}
	\includegraphics[scale=0.66]{data_size}
	\caption{Comparison of generated data size for each technique (log-log scale).}
	\label{fig-data-size}
\end{figure}


The amount of data generated for each technique, including fixed
overhead, is shown in Figure~\ref{fig-data-size}. It is immediately
apparent from these results that there is a divergence between the two
techniques that generate bitmap images (server-side image generation and
server-side image overlay), and the remaining techniques that generate
either HTML or XML (i.e., text).

Both the server-side image generation and server-side image overlay
techniques scale particularly well with regard to the amount of data
generated. Interestingly, the amount of data generated by the image
generation technique increases by about 8\,kB up to the 8,192-point data
set, but then \emph{drops} by about 90\,kB over the next three data
sets. This occurs because the number of points plotted has become
sufficient to cover most of the base map. In other words, a large
portion of the composite map image is a single color (see
Figure~\ref{fig-grid-points} on page~\pageref{fig-grid-points} for an
example), which compresses more efficiently.

The amount of data generated by the image overlay technique appears
constant, but actually increases by about 2\,kB across the entire range
of tests. This has important implications for the ability of this
technique to handle multiple layers. Because the overlay images are
quite small (less than 2\,kB for up to one million points), it should be
feasible to pre-load several overlay images into a client-side array and
switch them on and off as desired.

The server-side HTML overlay and Google Maps techniques clearly do not
scale well, and begin to visibly diverge from the other two techniques
once the amount of data generated exceeds about 5\% of the fixed
overhead. For the HTML overlay technique this occurs somewhere between
64 and 128 points, whereas for Google Maps it occurs somewhere between
256 and 512 points. The divergence increases rapidly beyond this point
for both techniques, with the HTML overlay technique suffering the most.
The latter occurs because the HTML overlay technique needs to generate
additional CSS attributes in order to correctly position the
\verb|<DIV>| elements, whereas the Google Maps technique needs only to
return a more compact list of latitude/longitude coordinates.

For Google Earth, the amount of data generated is clearly proportional
to the number of points, but the Google Earth results are otherwise not
directly comparable with the other techniques, as we were unable to
determine whether Google Earth had any fixed overhead.


\subsection{Page load time}

For each test run, we recorded the length of time taken to generate the
data at the server and to display the page in the client browser. The
former is illustrated in Figure~\ref{fig-data-generation-time} and the
latter in Figure~\ref{fig-page-load-time}. The combined time (data
generation + display time) is shown in Figure~\ref{fig-combined-time}.


\subsubsection{Data generation time}


\begin{figure}
	\includegraphics[scale=0.66]{data_generation_time}
	\caption{Comparison of data generation time for each technique (log-log scale).}
	\label{fig-data-generation-time}
\end{figure}


The results show that the length of time taken to generate the source
data increases in proportion with the amount of points to be plotted, as
expected. It is interesting to note the differences in data generation
time for each technique, however. Data generation for all of the
``text-based'' techniques (HTML overlay, Google Maps and Google Earth)
is consistently faster than for the ``image-based'' techniques (image
generation and image overlay).

Server-side image generation generally takes the longest to generate its
data. This is because it not only has to map points from
latitude/longitude into projected map coordinates, but also must plot
these points onto the base map image, then compress the composite image
as a JPEG. The image to be compressed is also moderately complex, which
only adds to the data generation time. Server-side image generation
performs slightly better because it uses a less complex compression
method (PNG) and the image being compressed is much simpler (a
collection of colored points on a blank background).

The server-side HTML overlay techniques appears faster at generating
data than either of the two image-based techniques at the low end, but
is similar in performance at the high end. In this technique the server
only needs to map latitude/longitude to projected map coordinates; no
images need to be generated and there is no compression. At the high
end, however, this advantage is clearly offset by the large volume of
data that is generated. Google Maps is faster again, because all
processing is carried out on the client; the server's only involvement
is to generate a list of latitude/longitude coordinates. A similar
argument also applies for Google Earth.

In terms of data generation, it appears that all techniques tested scale
reasonably well. The image-based techniques perform worse at the low end
because they involve more complex processing than the text-based
techniques, but this is offset at the high end by the relatively
constant amount of data generated. Conversely, the text-based techniques
perform better at the low end, but are negatively imapcted at the high
end by the sheer volume of data produced (tens or hundreds of megabytes
vs.\ hundreds of kilobytes).


\subsubsection{Map display time}


\begin{figure}
	\centering
	\includegraphics[scale=0.66]{page_load_time}
	\caption{Comparison of map display time for each technique (log-log scale).}
	\label{fig-page-load-time}
\end{figure}


These results show quite a spectacular difference between the
image-based and text-based techniques. The time taken to display the map
is essentially constant for both of the image-based techniques,
regardless of the number of points to be plotted. This is not surprising
given that the size of the generated data is also essentially constant,
and that the browser is simply loading and displaying static images. The
image overlay technique appears slightly slower than the image
generation technique. This is probably because the image overlay
technique has to load two images from the server (the base map and the
overlay), compared to one image for the image generation technique.

In contrast, the text-based technique clearly do not scale well with
regards to map display time. Google Maps suffers particularly in this
regard, with display time exceeding ten seconds shortly past 512 points.
Testing was abandoned at 4,096 points, with a single test run taking
over seven minutes. The HTML overlay technique fares better, exceeding
ten seconds somewhere between 4,096 and 8,192 points. Testing was
abandoned at 32,768 points, with a single test run taking almost ten
minutes. Interestingly, Google Earth performed worse at the low end but
did better at the high end, presumably because it is specifically
designed to handle these kinds of tasks. We were able to reach 131,072
points before testing was abandoned.


\subsubsection{Combined time}


\begin{figure}
	\centering
	\includegraphics[scale=0.66]{combined_time}
	\caption{Comparison of combined page load time for each technique (log-log scale).}
	\label{fig-combined-time}
\end{figure}


Combining the data generation and map display times yields little change
in the curves for the text-based techniques, because the data generation
times are very small compared to the map display times. There is a more
obvious impact on the image-based techniques, with both techniques
remaining more or less constant up to about 2,048 points, then slowing
as the number of points increases beyond that. However, the slowdown is
nowhere near as dramatic as for the text-based techniques; even the
largest data set only takes about nineteen seconds overall. The image
overlay technique does display a slight advantage of about half a second
over the image generation technique for the largest data set, but
further experiments will be required to determine whether this is a
statistically significant difference.


\subsection{Memory usage}

We measured both the real and virtual memory usage of the browser by
running the \texttt{top} utility after each test run and observing the
memory usage in each category. This told us the size of both the current
``working set'' and the total memory footprint of the browser process
after it had completed a test run. The real memory results are shown in
Figure~\ref{fig-real-memory} and the virtual memory results are shown in 
Figure~\ref{fig-virtual-memory}


\begin{figure}
	\centering
	\includegraphics[scale=0.66]{real_memory}
	\caption{Comparison of real memory usage for each technique (log-log scale).}
	\label{fig-real-memory}
\end{figure}


\begin{figure}
	\centering
	\includegraphics[scale=0.66]{virtual_memory}
	\caption{Comparison of virtual memory usage for each technique (log-log scale).}
	\label{fig-virtual-memory}
\end{figure}


While both sets of results display similar trends, the real memory data
proved somewhat difficult to interpret. Real memory usage was generally
consistent across test runs, but would also frequently fluctuate upwards
by a factor of nearly two for no readily apparent reason. This is
particularly apparent with the HTML overlay technique beyond 1,024
points. We can only assume that this was a result of other processes on
the test machine interacting with the browser process in unexpected
ways. We are therefore somewhat wary of the real memory data, but they
are at least broadly consistent with the virtual memory data. The
virtual memory data proved more consistent overall, as the virtual
memory footprint of a process is less likely to be impacted by other
running processes.

The results show that the two image-based techniques have essentially
constaint memory usage regardless of the number of points plotted. This
is to be expected, given that the size of the source data is also
essentially constant. The text-based techniques, however, clearly begin
to diverge as the number of points increases. The HTML overlay technique
starts to visibly diverge somewhere between 2,048 and 4,096 points,
while Google Maps starts to visbly diverge 64 and 128 points. This is in
line with our expectation for these techniques that memory usage would
increase in proportion to the number of points.


\section{Conclusion}

In this research, we tested the scalability of four techniques for
online geovisualization of web site hits, with respect to the number of
points to be plotted on the map. The four techniques tested were
server-side image generation, server-side image overlay, server-side
HTML overlay and Google Maps. The results clearly show that the
server-side image generation and server-side image overlay techniques
scale the best from small to large data sets. The HTML overlay and
Google Maps techniques work well for small data sets, but their
performance rapidly deteriorates as the size of the data set increases,
to the point where they are essentially unusable.

Our aim was to identify which was the best technique to use to plot hits
on the Otago School of Business repository. We are now left with a choice
between two very similarly-performing techniques.


% The
% software extracts IP addresses from the web server logs, geolocates them
% using the free MaxMind GeoLite Country database\footnote{See
% \url{http://www.maxmind.com/app/ip-location}.}, then stores the
% resulting country information in a separate database.

% The Tasmania software, however, uses countries as its base unit of
% aggregation. We were interested in looking at the distribution on a finer
% level, down to individual cities if possible


\bibliography{Map_Visualisation}

\begin{received}
...
\end{received}
\end{document}
+\documentclass[acmtocl,acmnow]{acmtrans2m}
+\documentclass[acmnow]{acmtrans2m}
 \usepackage{graphicx}
 \category{H.3.5}{Information Storage and Retrieval}{Online Information Services}[web-based services]
 \terms{Experimentation, Measurement, Performance}
+\keywords{geolocation, geovisualization, scalability, GD, Google Maps}
+\keywords{downloads, geolocation, geovisualization, scalability, Google
+	Maps, distribution style, dynamic map generation}
 \begin{document}
 \cite{Stan-N-2006-running}, using the GNU
 EPrints\footnote{\url{http://www.eprints.org/}} repository management
 software. This repository quickly attracted interest from around the
 world and the number of abstract views and document downloads began to
+steadily increase. We were obviously very interested in tracking this
+increase, particularly with respect to where in the world the hits were
+coming from. The EPrints statistics management software developed at the
+steadily increase. We were very interested in tracking this increase,
+particularly with respect to where in the world the hits were coming
+from. The EPrints statistics management software developed at the
 University of Tasmania \cite{Sale-A-2006-stats} proved very useful in
 this regard, providing us with detailed per-eprint and per-country
 download statistics; an example of the latter is shown in
 Figure~\ref{fig-tas-stats}. However, while this display provides an
 city level for the US within a 25 mile radius''
 \cite{Maxm-G-2006-GeoLiteCity}. Their commercial \emph{GeoIP City}
 database claims 80\% accuracy for the same parameters.
+The techniques used by these systems can generally be divided into two
+classes. The first class of techniques generate a single bitmap image
+that contains both the map and the icons representing web hits. This can
+be achieved by programmatically plotting points onto a base map image;
+the composite image is then displayed at the client. We shall henceforth
+refer to this class of techniques as \emph{single layer} techniques.
+The second class of techniques separately return both a base map image
+and some kind of overlay containing the plotted points. The overlay is
+then combined with the base map at the client. We shall henceforth refer
+to this class of techniques as \emph{multi-layer} techniques.
+The techniques used by these prior systems can generally be divided into
+two classes. The first class of techniques generate a single bitmap
+image that contains both the map and the graphics representing web hits.
+This can be achieved by programmatically plotting points onto a base map
+image; the composite image is then displayed at the client. We shall
+henceforth refer to this class of techniques as \emph{single-layer}
+techniques. The second class of techniques separately return both a base
+map image and some kind of overlay containing the plotted points. The
+overlay and the base map are then displayed as separate items at the
+client. We shall henceforth refer to this class of techniques as
+\emph{multi-layer} techniques.
 Both classes of techniques have been used in the aforementioned systems,
 but multi-layer techniques appear to have been particularly popular. For
 example, Palantir used a multi-layer technique, where a Java applet running
 dynamic multi-layer technique that has only become feasible relatively
 recently with the advent of widespread support for CSS positioning and
 Ajax technologies in many browsers.
+Multi-layer techniques enjoy a particular advantage over single layer
+Multi-layer techniques enjoy a particular advantage over single-layer
 techniques, in that they provide the potential for a more flexible
 GIS-like interaction with the map, with multiple layers that can be
 activated and deactivated as desired. This flexibility could explain why
+such techniques appear more prevalent in the literature. However,
+multi-layer techniques tend to rely on more recent web technologies such as
+CSS2 and Ajax, whereas single layer techniques generally do not. Single
+layer techniques should therefore be portable to a wider range of client
+and server environments.
+Each technique comprises a specific technology or collection of
+technologies (such as transparent bitmap overlays), implemented using a
+specific distribution style. For example, one single layer technique
+might be implemented completely server-side while another might use a
+mixture of server-side and client-side processing. Similarly, multi-layer
+such techniques appear more prevalent in the literature. As we shall see
+shortly, however, web-based multi-layer techniques tend to rely on more
+recent web technologies such as CSS and Ajax, whereas single-layer
+techniques generally do not. Single-layer techniques should therefore be
+portable to a wider range of client and server environments.
+Each map generation and display technique comprises a specific
+technology or collection of technologies (such as transparent bitmap
+overlays + CSS positioning), implemented using a specific distribution
+style. For example, a particular single-layer technique might be
+implemented completely server-side while another might use a mixture of
+server-side and client-side processing. Similarly, multi-layer
 techniques may adopt different distribution styles, and the overlays
 themselves might take the form of transparent images, absolutely
 positioned HTML elements, dynamically generated graphics, etc.
+Given the many possible techniques that were available, the next
+question was which techniques would be most suitable for our purposes?
+Scalability is a key issue for web applications in general \cite[p.\
+]{Offu-J-2002-quality}, and online activity visualization in
+Given the wide variety of possible techniques that were available, the
+next question was which techniques would be most suitable for our
+purposes? Scalability is a key issue for web applications in general
+\cite[p.\ 28]{Offu-J-2002-quality}, and online activity visualization in
 particular \cite[p.\ 50]{Eick-SG-2001-sitevis}, so we were particularly
 interested in techniques that could scale to a large number of points.
 For example, at the time of writing the Otago EPrints repository had
 been accessed from over 10,000 distinct IP addresses, each potentially
 representing a distinct geographical location. Separating out the type
 of hit (abstract view versus document download) increased that figure to
+nearly 13,000.
+nearly 13,000. Early informal experiments with these data indicated that
+a single-layer composite map image would work quite well, whereas Google
+Maps would not.
 We first narrowed down the range of techniques to just four (server-side
 image generation, server-side image overlay, server-side HTML overlay
 and Google Maps); the selection process and details of the techniques
 experiments was conducted on each technique with progressively larger
 data sets, and the elapsed time and memory usage were measured. The
 experimental design is discussed in Section~\ref{sec-experiment}.
+Our initial intuition was that server-side image generation and
+Our initial intuition was that the server-side image generation and
 server-side image overlay techniques would scale best, and this was
 borne out by the results of the experiments, which show that both
 techniques scale reasonably well to very large numbers of points. The
 other two techniques proved to be reasonable for relatively small
 performance deteriorated rapidly beyond this. The results are discussed
 in more detail in Section~\ref{sec-results}.
 It should be noted that the intent of the experiments was not to
+identify statistically significant differences between techniques. It
+was expected that variations across techniques would be obvious, and the
+experiments were designed to test this expectation. However, the two
+best performing techniques, server-side image generation and server-side
+image overlay, produced very similar results, so a more formal
+statistical analysis of these techniques may be warranted. This and
+other possible future directions are discussed in
+Section~\ref{sec-future}.
+identify statistically significant differences in performance across the
+four techniques. It was expected that variations across techniques would
+be reasonably clear-cut, and the experiments were designed to test this
+expectation. However, the two best performing techniques, server-side
+image generation and server-side image overlay, produced very similar
+results, so a more formal statistical analysis of these techniques may
+be warranted. This and other possible future directions are discussed in
+Section~\ref{sec-conclusion}.
 \section{Technique selection}
 \label{sec-techniques}
 \emph{data server} style is where the server only supplies raw data, and
 all manipulation, display and analysis takes place at the client. In
 other words, this is primarily a client-side processing model, as
 illustrated in Figure~\ref{fig-distribution-styles}(a). For example,
+Palantir implemented a multi-layer technique using this distribution style
+\cite{Papa-N-1998-Palantir}, where the source data were generated at the
+server and the map was generated, displayed and manipulated by a Java
+applet running at the client. The data server distribution style can
+provide a very dynamic and interactive environment to the end user, but
+clearly requires support for executing application code within the web
+browser, typically using something like JavaScript, Java applets or
+Palantir implemented a multi-layer technique using this distribution
+style \cite{Papa-N-1998-Palantir}, where the source data were generated
+at the server and the map was generated, displayed and manipulated by a
+Java applet running at the client. The data server distribution style
+can provide a very dynamic and interactive environment to the end user,
+but clearly requires support for executing application code within the
+web browser, typically using something like JavaScript, Java applets or
 Flash. JavaScript is now tightly integrated into most browsers, but the
 same cannot be said for either Java or Flash. That is, we cannot
 necessarily guarantee the existence of a Java virtual machine or Flash
 plugin in every browser, which violates our requirement to avoid manual
 installation of additional client-side software. We can therefore
 eliminate Java- or Flash-based data server techniques from
+consideration, but JavaScript-based data server techniques may still be
+feasible.
+consideration, but JavaScript-based data server techniques are feasible.
+Indeed, as we will see in Section~\ref{sec-overlay}, Google Maps is an
+example of such a technique.
 \begin{figure}
 	\centering
 	\begin{tabular}{ccc}
+		\includegraphics[scale=1]{data_server}	&
+		\includegraphics[scale=0.9]{data_server}	&
 		\qquad	&
+		\includegraphics[scale=1]{image_server}	\\
+		\includegraphics[scale=0.9]{image_server}	\\
 		\footnotesize (a) Data server	&
 		\qquad	&
 		\footnotesize (b) Image server	\\
 		\\
 		\\
+		\includegraphics[scale=1]{model_interaction}	&
+		\includegraphics[scale=0.9]{model_interaction}	&
 		\qquad	&
+		\includegraphics[scale=1]{shared}	\\
+		\includegraphics[scale=0.9]{shared}	\\
 		\footnotesize (c) Model interaction environment	&
 		\qquad	&
 		\footnotesize (d) Shared environment	\\
 	\end{tabular}
 illustrated in Figure~\ref{fig-distribution-styles}(b). Consequently,
 techniques that use this style require no additional client-side
 software, and thus meet our requirements. The downside is that the
 resultant visualization can tend to be very static and non-interactive
+in nature, as it is just a simple bitmap image.
+in nature, as it is typically just a simple bitmap image.
 The \emph{model interaction environment} style is where a model created
 at the server can be explored at the client, as illustrated in
 Figure~\ref{fig-distribution-styles}(c). \citeN{Wood-J-1996-vis}
 interaction between the client and server after the model has been
 downloaded. This means that while the downloaded model can be very
 dynamic and interactive, changing the underlying data requires a new
 model to be generated at the server and downloaded to the client.
+Similar restrictions apply to techniques using this style as to the
+data server style, so Java- and Flash-based model interaction
+environment techniques can be eliminated from consideration. For similar
+reasons, we can also eliminate solutions that require browser plugins
+such as VRML or SVG (although native support for the latter is beginning
+to appear in some browsers). It may be possible to implement this
+distribution style using only client-side JavaScript, but it is presently
+unclear as to how effective this might be.
+% future work: implement model interaction using JavaScript?
+Similar restrictions apply to techniques using this style as to the data
+server style, so Java- and Flash-based model interaction environment
+techniques can be eliminated from consideration. For similar reasons, we
+can also eliminate solutions such as VRML or SVG that require external
+browser plugins (although native support for SVG is beginning to appear
+in some browsers). It may be possible to implement this distribution
+style using only client-side JavaScript, but it is presently unclear as
+to how effective this might be.
 Finally, the \emph{shared environment} style is where data manipulation
 is done at the server, but control of that manipulation, rendering, and
 display all occur at the client, as illustrated in
 interaction environment style, but with the addition of a feedback loop
 from the client to the server, thus enabling a more flexible and dynamic
 interaction. Ajax technologies \cite{Garr-JJ-2005-Ajax} can easily
 support this kind of distribution style. For example,
+\citeN{Saya-A-2006-GISWS} discuss the use of Ajax to integrate Google
+Maps with existing GIS visualization web services. We can eliminate
+\citeN{Saya-A-2006-GISWS} use Ajax to integrate Google Maps with
+existing GIS visualization web services. We can eliminate specific
 shared environment techniques from consideration based on the same
+criteria as were applied to the other three styles.
+\subsection{Single layer techniques}
+criteria as were applied to the other three styles (e.g., no Java- or
+Flash-based techniques).
+\subsection{Single-layer techniques}
 \label{sec-image-gen}
+As noted earlier, single layer techniques work by directly plotting
+As noted earlier, single-layer techniques work by directly plotting
 geolocated IP addresses onto a base map image, then displaying the
 composite image at the client. A typical example of the kind of output
 that might be produced is shown in Figure~\ref{fig-image}. Such
 techniques require two specific components: software to programmatically
 create and manipulate bitmap images (for example, the GD image
 library\footnote{\url{http://www.boutell.com/gd/}}); and software to
+transform raw latitude/longitude coordinates into projected map
+coordinates on the base map (for example, the PROJ.4 cartographic
+projections library\footnote{\url{http://www.remotesensing.org/proj/}}).
+\begin{figure}
+	\centering
+	\includegraphics[width=0.95\textwidth,keepaspectratio]{ImageGeneration-full}
+	\caption{Sample output from the server-side image generation technique.}
+transform latitude/longitude coordinates into projected map coordinates
+on the base map (for example, the PROJ.4 cartographic projections
+library\footnote{\url{http://www.remotesensing.org/proj/}}).
+\begin{figure}
+	\centering
+	\includegraphics[width=0.9\textwidth,keepaspectratio]{ImageGeneration-full}
+	\caption{Sample output from the (single-layer) server-side image
+		generation technique.}
 	\label{fig-image}
 \end{figure}
+Single layer techniques could use any of the distribution styles
+Single-layer techniques could use any of the distribution styles
 discussed in Section~\ref{sec-distribution}. However, all but the image
 server style would require the installation of additional client-side
 software for generating images and performing cartographic projection
+operations, so we will only consider single layer using the image
+server distribution style (or \textbf{server-side image generation})
+from this point on.
+operations, so we will only consider single-layer techniques that use
+the image server distribution style (or \textbf{server-side image
+generation}).
 The server-side image generation technique provides some distinct
 advantages. It is relatively simple to implement and is fast at
 producing the final image, mainly because it uses existing,
+well-established technologies. It is also bandwidth efficient: the size
+of the generated map image is determined by the total number of pixels
+and the compression method used, rather than by the number of points to
+be plotted. The amount of data to be sent to the client should therefore
+remain more or less constant, regardless of the number of points
+plotted.
+well-established technologies. It is also bandwidth efficient, because
+the size of the generated map image is determined by its pixel
+dimensions and the compression method used, rather than by the number of
+points to be plotted. The amount of data to be sent to the client should
+therefore remain more or less constant, regardless of the number of
+points plotted.
 This technique also has some disadvantages, however. First, a suitable
 base map image must be acquired. This could be generated from a GIS, but
 if this is not an option an appropriate image must be obtained from a
+third party. Care must be taken in the latter case to avoid potential
+copyright issues. Second, the compression method used to produce the
+final composite map image can have a significant impact on visual
+quality. For example, lossy compression methods such as JPEG can make
+the points plotted on the map appear distinctly fuzzy or ``muddy'', as
+shown in Figure~\ref{fig-image-quality}. A lossless compression method
+such as PNG will avoid this problem, but will tend to produce larger
+image files. Finally, it is harder to provide interactive map
+manipulation features with this technique, as the output is a simple
+static image. Anything that changes the content of the map (such as
+panning or changing the visibility of points) will require the entire
+image to be regenerated. Zooming could be achieved if a very high
+resolution base map image was available, but the number of possible zoom
+levels might be restricted.
+\begin{figure}
+	\centering
+	\includegraphics[scale=1.25]{jpeg_detail}\medskip
+	\includegraphics[scale=1.25]{overlay_detail}
+	\caption{Image quality of JPEG (Q=90) image generation (top) vs.\
+	PNG image overlay (bottom).}
+third party. Care must be taken in the latter case to avoid copyright
+issues. Second, the compression method used to produce the final
+composite map image can have a significant impact on visual quality. For
+example, lossy compression methods such as JPEG can make the points
+plotted on the map appear distinctly fuzzy or ``muddy'', as shown in
+Figure~\ref{fig-image-quality}. Lossless compression methods such as PNG
+avoid this problem, but may produce larger files for the same image.
+Finally, it is harder to provide interactive map manipulation features
+with this technique, as the output is a simple static image. Anything
+that changes the content of the map (such as panning or changing the
+visibility of certain points) will require the entire image to be
+regenerated. Zooming could be achieved if a very high resolution base
+map image was available, but the number of possible zoom levels might be
+restricted.
+\begin{figure}
+	\centering
+	\includegraphics[scale=0.98]{jpeg_detail}
+	\includegraphics[scale=0.98]{overlay_detail}
+	\caption{Image quality of JPEG (Q=90) image generation (left) vs.\
+	PNG image overlay (right).}
 	\label{fig-image-quality}
 \end{figure}
 \subsection{Multi-layer techniques}
 \label{sec-overlay}
+% Look for publications regarding the DataCrossing Ajax client.
+% See <http://datacrossing.crs4.it/en_Documentation_Overlay_Example.html>.
+% They use <IMG> rather than <DIV>, which has the advantage of the image
+% being loaded only once, but makes it harder to dynamically change the
+% appearance of markers. The amount of data generated will still be
+% proportional to the number of points (one <IMG> per point).
+Multi-layer techniques also involve plotting points onto a base map image,
+but they differ from single layer techniques in that the points are
+not composited directly onto the base map image. Rather, the points are
+Multi-layer techniques also involve plotting points onto a base map
+image, but they differ from single-layer techniques in that the points
+are not plotted directly onto the base map image. Rather, the points are
 displayed as an independent overlay on top of the base map image. This
+provides a significant advantage over single layer techniques, as it
+provides a significant advantage over single-layer techniques, as it
 enables the possibility of multiple independent layers that can be
 individually shown or hidden. This is very similar to the multi-layer
+functionality provide by GIS, and is an effective way to provided
+functionality provided by GIS, and is an effective way to provide
 interactive visualizations of geographic data
 \cite{Wood-J-1996-vis,MacE-AM-1998-GIS}. We still have the problem of
 finding a suitable base map image, however.
 points are plotted, which is then overlaid on the base map image (in our
 implementation, the output looks essentially identical to that shown in
 Figure~\ref{fig-image} on page~\pageref{fig-image}). This requires the
 overlay image to be in either PNG or GIF format, as JPEG does not
+support transparency. Fortunately the overlay image is likely to contain
+a lot of ``white space'', which compresses very well, so use of a
+support transparency. The overlay image is likely to contain
+considerable ``white space'', which compresses very well, so use of a
 lossless compression method should not be an issue. This also eliminates
 the ``fuzziness'' issue noted earlier (see
 Figure~\ref{fig-image-quality}). The size of the image overlay will
 generally be proportional to the number of points to be plotted, but the
 image compression should have a moderating effect on this.
+As noted earlier, generating images at the client would require
+additional software to be installed, so we will only consider the data
+server distribution style for image overlays (or \textbf{server-side
+image overlay}). That is, both the base map image and the overlay(s) are
+generated at the server.
+As noted in Section~\ref{sec-image-gen}, generating images at the client
+would require additional software to be installed, so we will only
+consider the data server distribution style for image overlays (or
+\textbf{server-side image overlay}). That is, both the base map image
+and the overlay(s) are generated at the server.
 An HTML overlay comprises a collection of HTML elements corresponding to
 the points to be plotted, which are positioned over the base map image
 using CSS absolute positioning. There is considerable flexibility as to
 the types of elements that could be used to construct the overlay. One
 possibility is to use \verb|<IMG>| elements to place icons on the base
+map; this appears to be the approach adopted by Google Maps (see
+map, which appears to be the approach adopted by Google Maps (see
 Figure~\ref{fig-google}). Another possibility is to use appropriately
 sized and colored \verb|<DIV>| elements, which then appear as colored
 blocks ``floating'' over the base map image (in our implementation, the
 output looks essentially identical to that shown in
 \begin{figure}
 	\centering
+	\includegraphics[width=0.95\textwidth,keepaspectratio]{GoogleMap-full.png}
+	\includegraphics[width=0.9\textwidth,keepaspectratio]{GoogleMap-full.png}
 	\caption{Sample output from the Google Maps technique.}
 	\label{fig-google}
 \end{figure}
 because only HTML (i.e., text) is being generated, not images. This can
 be easily achieved using client-side JavaScript, so HTML overlays can
 use any of the distribution styles discussed in
 Section~\ref{sec-distribution} without violating our requirements. We
+have therefore adopted two representative multi-layer techniques for our
+experiments: \textbf{server-side HTML overlays} (using the image server
+distribution style) and \textbf{Google Maps} (using the data server
+distribution style). Since Google Maps uses \verb|<IMG>| elements, we
+have used \verb|<DIV>| elements for the server-side HTML overlay.
+have therefore adopted two representative HTML overlay techniques for
+our experiments: \textbf{server-side HTML overlays} (using the image
+server distribution style) and \textbf{Google Maps} (using the data
+server distribution style). Since Google Maps uses \verb|<IMG>|
+elements, we have used \verb|<DIV>| elements for the server-side HTML
+overlay.
 Server-side HTML overlays are actually slightly simpler to implement
 than either server-side image generation or image overlays, because we
 do not need to write any code to generate or manipulate images (the base
 map image is static and thus requires no additional processing). All
 that is required is code to transform latitude/longitude coordinates
+into projected map coordinates and produce corresponding \verb|<DIV>|
+into projected map coordinates and generate corresponding \verb|<DIV>|
 elements.
 Google Maps \cite{Goog-M-2006-maps} is a more complex proposition. This
 technique uses the data server distribution style, where JavaScript code
 running within the browser enables the client to manipulate the base map
 and its overlays. Data and map images are requested asynchronously from
+the server as required, using Ajax technologies, which seems to imply
+the server as required using Ajax technologies, which seems to imply
 that Google Maps in fact uses the shared environment distribution style.
 However, the server has no involvement beyond simply supplying data to
 the client. In the shared environment distribution style, the server is
 directly involved in manipulating the map, under the control of the
 client. This is clearly not the case with Google Maps.
 The primary advantage of Google Maps is the powerful functionality it
 provides for generating and interacting with the map. Users may pan the
+map in any direction and zoom in and out to many different levels. A
+map in any direction and zoom to many different levels of detail. A
 satellite imagery view is also available. In addition, further
+information about each point plotted (such as the name of the city, for
+example) can be displayed in a callout attached to the point, as shown
+in Figure~\ref{fig-google}.
+information about each point plotted (such as the name of the city) can
+be displayed in a callout attached to the point, as shown in
+Figure~\ref{fig-google}.
 However, there are also some significant disadvantages to the Google
 Maps technique\footnote{Interestingly, the Google Earth application
 addresses many of these issues, but since it is not a browser-based
+solution it falls outside the scope of our consideration. However, for
+interest's sake we did an informal comparison between Google Earth and
+the four techniques that we have tested, and this has been included in
+the results in Section~\ref{sec-results}.}. First, it is a distributed
+application, thus making it more complex to implement, test and debug
+\cite{Bates-PC-1995-distdebug,Ensl-PH-1978-distributed}. Second, the
+server must have a registered API key from Google, which is verified
+every time that a page attempts to use the API. Similarly, the client
+must connect to Google's servers in order to to download the API's
+JavaScript source. This means that the technique must have an active
+Internet connection in order to work. Finally, the Google Maps API does
+not currently provide any way to toggle the visibility of markers on the
+map, so it is not possible to implement the interactive ``layers''
+mentioned at the start of this section. (It is possible, of course, that
+Google will implement this feature in a later version of the API.)
+solution it falls outside the scope of our consideration.}. First, it is
+a distributed application, thus making it more complex to implement,
+test and debug \cite{Bates-PC-1995-distdebug,Ensl-PH-1978-distributed}.
+Second, the server must have a registered API key from Google, which is
+verified every time that a page attempts to use the API. Similarly, the
+client must connect to Google's servers in order to to download the
+API's JavaScript source. This means that the technique requires an
+active Internet connection in order to work. Finally, the Google Maps
+API does not currently provide any way to toggle the visibility of
+markers on the map, so it is not possible to implement the interactive
+``layers'' mentioned at the start of this section. (It is possible, of
+course, that Google may implement this feature in a future version of
+the API.)
 The most significant disadvantage of all HTML overlay techniques,
 however, is that the size of the HTML overlay is directly proportional
 to the number of points to be plotted. There will be one overlay element
 (\verb|<DIV>| or \verb|<IMG>|) per point, so a very large number of
 points will result in an even larger amount of HTML source being
 generated. We expect that this will lead to excessive browser memory
 usage, and consequently that these techniques will not scale well at the
+high end. However, they may still be useful for smaller data sets that
+require interactive manipulation.
+high end. However, they may still be appropriate for smaller data sets
+that require interactive manipulation.
 \section{Experimental design}
 \label{sec-experiment}
 \begin{figure}
 	\centering
+	\includegraphics[width=0.95\textwidth,keepaspectratio]{16384_points}
+	\includegraphics[width=0.9\textwidth,keepaspectratio]{16384_points}
 	\caption{The 16,384-point data set plotted on the base map.}
 	\label{fig-grid-points}
 \end{figure}
 the local network was a potential confounding factor. We therefore
 decided to eliminate network performance from the equation by running
 both the server and the client on the same machine\footnote{A Power
 Macintosh G5 1.8\,GHz with 1\,GB RAM, running Mac OS X 10.4.7, Apache
+.0.55, PHP 4.4 and Perl 5.8.6.}. This in turn enabled us to measure the
+time taken for data generation and page display independently, thus
+simplifying the process of data collection and also ensuring that the
+client and server processes did not unduly interfere with each other,
+despite running on the same machine.
+.0.55, PHP 4.4 and Perl 5.8.6.}. This in turn enabled us to
+independently measure the time taken for data generation and page
+display, thus simplifying the process of data collection and also
+ensuring that the client and server processes did not unduly interfere
+with each other, despite running on the same machine.
 It could be argued that network performance would still have a
 confounding effect on the Google Maps technique, but this would only be
 likely for the initial download of the API (comprising about 235\,kB of
 server from proceeding.
 For each data set generated, we recorded its size, the time taken to
 generate it, the time taken to display the resultant map in the browser,
+and the amount of real and virtual memory used during the test by the
+browser. We also intended to measure the memory usage of the server, but
+and the amount of real and virtual memory used by the browser during the
+test. We also intended to measure the memory usage of the server, but
 this proved more difficult to isolate than expected, and was thus
 dropped from the experiments. The data set generation time and browser
 memory usage were measured using the \texttt{time} and \texttt{top}
 utilities respectively (the latter was run after each test run to avoid
 test'' debugging feature of Apple's Safari web browser, which can
 repetitively load a set of pages while recording various statistics, in
 particular the time taken to load the page. Tests were run up to twenty
 times each where feasible, in order to reduce the impact of random
+variations. Some tests were run fewer times because they took a very
+long time (several minutes for a single test run). We typically broke
+off further testing when a single test run took longer than about five
+minutes, as by this stage performance had already deteriorated well
+beyond usable levels.
+While it is somewhat beyond the scope of this work, out of interest some
+informal tests were also undertaken using the Google Earth application.
+A Perl script was used to generate a collection of KML files
+corresponding to the data sets described above. Each data set was then
+loaded into Google Earth, and a stopwatch was used to measure how long
+it took to load the data set, defined as the period during which the
+dialog box ``\textsf{Loading myplaces.kml, including enabled overlays}''
+was displayed on screen.
+variations. Some tests were run fewer times because they took an
+excessive amount of time to complete (i.e., several minutes for a single
+test run). We typically broke off further testing when a single test run
+took longer than about five minutes, as by this stage performance had
+already deteriorated well beyond usable levels.
 \subsection{Technique implementation}
 \begin{description}
 	\item[server-side image generation] The dispatcher page included a
 	standard \verb|<IMG>| element that called the Perl script. This
+	script then loaded a base map PNG image, plotted points directly
+	onto it, and returned the
+	composite map to the client as a JPEG image (with the ``quality''
+	parameter set to 90).
+	script loaded a base map PNG image, plotted points directly onto it,
+	and returned the composite map to the client as a JPEG image (with
+	the ``quality'' parameter set to 90).
 	\item[server-side image overlay] The dispatcher page included two
 	\verb|<IMG>| elements, the first for the base map and the second for
 	the overlay, both with identical CSS positioning attributes. The
 do a full analysis and statistical comparison of the performance of the
 different techniques, but rather to identify broad trends. We have not,
 therefore, carried out any statistical analysis on the results. We will
 now discuss the results for data size, page load time and memory usage.
+Because the data set size increases by powers of two, we have used
+log-log scales for all plots.
+Because the number of points in each data set increases in powers of
+two, we have used log-log scales for all plots.
 \subsection{Data size}
 During each test run, the data generated by the server was saved to a
 file and its size in bytes recorded. In the case of the server-side
 image generation and server-side image overlay techniques, the file
 comprised a bitmap image; whereas for the server-side HTML overlay and
+Google Maps techniques, the file comprised XML data. (The latter was
+also true of the KML files generated for use with Google Earth.)
+Google Maps techniques, the file comprised HTML or XML text,
+respectively.
 There was a certain amount of fixed overhead for each technique tested,
 as summarised in Table~\ref{tab-overhead}. This overhead comprised
 static files that were always downloaded to the client, regardless of
 	\centering
 	\begin{tabular}{lll}
 		Technique						&	Fixed overhead		&	Content	\\
 		\hline
+		Server-side image generation	&	629\,bytes			&	PHP (dispatcher)\smallskip	\\
+		Server-side image overlay		&	\(\approx\) 181\,kB	&	PHP (dispatcher) \\
+		Server-side image generation	&	629\,bytes			&	dispatcher (PHP)\smallskip	\\
+		Server-side image overlay		&	\(\approx\) 181\,kB	&	dispatcher (PHP) \\
 										&						&	base map image (JPEG)\smallskip	\\
+		Server-side HTML overlay		&	\(\approx\) 181\,kB	&	PHP (dispatcher) \\
+		Server-side HTML overlay		&	\(\approx\) 181\,kB	&	dispatcher (PHP) \\
 										&						&	base map image (JPEG)\smallskip	\\
+		Google Maps						&	\(\approx\) 235\,kB	&	PHP (dispatcher) \\
+										&						&	base map tiles \\
+										&						&	JavaScript (API) \\
+										&						&	various icons\smallskip	\\
+		(Google Earth)					&	unknown				&	\\
+		Google Maps						&	\(\approx\) 235\,kB	&	dispatcher (PHP) \\
+										&						&	base map image tiles (PNG) \\
+										&						&	API (JavaScript) \\
+										&						&	various icons (PNG)	\\
 	\end{tabular}
 	\caption{Fixed overhead for each technique.}
 	\label{tab-overhead}
 \end{acmtable}
 \begin{figure}
+	\includegraphics[scale=0.66]{data_size}
+	\centering
+	\includegraphics[scale=0.5]{data_size}
 	\caption{Comparison of generated data size for each technique (log-log scale).}
 	\label{fig-data-size}
 \end{figure}
 The amount of data generated for each technique, including fixed
 overhead, is shown in Figure~\ref{fig-data-size}. It is immediately
 apparent from these results that there is a divergence between the two
+techniques that generate bitmap images (server-side image generation and
+server-side image overlay), and the remaining techniques that generate
+either HTML or XML (i.e., text).
+techniques that generate images (server-side image generation and
+server-side image overlay), and the two techniques that generate text
+(server-side HTML overlay and Google Maps).
 Both the server-side image generation and server-side image overlay
 techniques scale particularly well with regard to the amount of data
 generated. Interestingly, the amount of data generated by the image
 scale well, and begin to visibly diverge from the other two techniques
 once the amount of data generated exceeds about 5\% of the fixed
 overhead. For the HTML overlay technique this occurs somewhere between
 and 128 points, whereas for Google Maps it occurs somewhere between
+and 512 points. The divergence increases rapidly beyond this point
+for both techniques, with the HTML overlay technique suffering the most.
+and 512 points. The divergence increases rapidly for both techniques
+beyond these points, with the HTML overlay technique suffering the most.
 The latter occurs because the HTML overlay technique needs to generate
+additional CSS attributes in order to correctly position the
+\verb|<DIV>| elements, whereas the Google Maps technique needs only to
+return a more compact list of latitude/longitude coordinates.
+For Google Earth, the amount of data generated is clearly proportional
+to the number of points, but the Google Earth results are otherwise not
+directly comparable with the other techniques, as we were unable to
+determine whether Google Earth had any fixed overhead.
+additional CSS attributes (i.e., more text) in order to correctly
+position the \verb|<DIV>| elements, whereas the Google Maps technique
+needs only to return a more compact list of latitude/longitude
+coordinates.
 \subsection{Page load time}
 \subsubsection{Data generation time}
 \begin{figure}
+	\includegraphics[scale=0.66]{data_generation_time}
+	\centering
+	\includegraphics[scale=0.5]{data_generation_time}
 	\caption{Comparison of data generation time for each technique (log-log scale).}
 	\label{fig-data-generation-time}
 \end{figure}
+The results show that the length of time taken to generate the source
+data increases in proportion with the amount of points to be plotted, as
+expected. It is interesting to note the differences in data generation
+time for each technique, however. Data generation for all of the
+``text-based'' techniques (HTML overlay, Google Maps and Google Earth)
+is consistently faster than for the ``image-based'' techniques (image
+generation and image overlay).
+Server-side image generation generally takes the longest to generate its
+data. This is because it not only has to map points from
+latitude/longitude into projected map coordinates, but also must plot
+these points onto the base map image, then compress the composite image
+as a JPEG. The image to be compressed is also moderately complex, which
+only adds to the data generation time. Server-side image generation
+performs slightly better because it uses a less complex compression
+method (PNG) and the image being compressed is much simpler (a
+collection of colored points on a blank background).
+The server-side HTML overlay techniques appears faster at generating
+data than either of the two image-based techniques at the low end, but
+is similar in performance at the high end. In this technique the server
+The results (see Figure~\ref{fig-data-generation-time}) show that the
+length of time taken to generate the source data increases in proportion
+to the number of points to be plotted, as expected. It is interesting to
+note the differences in data generation time for each technique,
+however. Data generation for both of the ``text-based'' techniques (HTML
+overlay and Google Maps) is consistently faster than for the
+``image-based'' techniques (image generation and image overlay).
+The results show that server-side image generation generally takes the
+longest to generate its data. This is because it not only has to map
+points from latitude/longitude into projected map coordinates, but also
+must plot these points onto the base map image, then compress the
+composite image as a JPEG. The image to be compressed is also moderately
+complex, which only adds to the data generation time. Server-side image
+overlay performs somewhat better because it uses a less complex
+compression method (PNG) and the image to be compressed is much simpler
+(a collection of colored points on a blank background).
+The server-side HTML overlay technique appears faster at generating data
+than either of the two image-based techniques at the low end, but is
+similar in performance at the high end. In this technique the server
 only needs to map latitude/longitude to projected map coordinates; no
+images need to be generated and there is no compression. At the high
+end, however, this advantage is clearly offset by the large volume of
+data that is generated. Google Maps is faster again, because all
+processing is carried out on the client; the server's only involvement
+is to generate a list of latitude/longitude coordinates. A similar
+argument also applies for Google Earth.
+images need to be generated and there is no compression to deal with. At
+the high end, however, this advantage is clearly offset by the
+significant volume of data being generated. Google Maps is faster again,
+because almost all processing is carried out on the client; the server's
+only involvement is to generate a simple list of latitude/longitude
+coordinates.
 In terms of data generation, it appears that all techniques tested scale
 reasonably well. The image-based techniques perform worse at the low end
 because they involve more complex processing than the text-based
 techniques, but this is offset at the high end by the relatively
 constant amount of data generated. Conversely, the text-based techniques
+perform better at the low end, but are negatively imapcted at the high
+perform better at the low end, but are negatively impacted at the high
 end by the sheer volume of data produced (tens or hundreds of megabytes
 vs.\ hundreds of kilobytes).
 \begin{figure}
 	\centering
+	\includegraphics[scale=0.66]{page_load_time}
+	\includegraphics[scale=0.5]{page_load_time}
 	\caption{Comparison of map display time for each technique (log-log scale).}
 	\label{fig-page-load-time}
 \end{figure}
+These results show quite a spectacular difference between the
+image-based and text-based techniques. The time taken to display the map
+is essentially constant for both of the image-based techniques,
+regardless of the number of points to be plotted. This is not surprising
+given that the size of the generated data is also essentially constant,
+and that the browser is simply loading and displaying static images. The
+image overlay technique appears slightly slower than the image
+generation technique. This is probably because the image overlay
+technique has to load two images from the server (the base map and the
+overlay), compared to one image for the image generation technique.
+These results (see Figure~\ref{fig-page-load-time}) reveal quite a
+spectacular difference between the image-based and text-based
+techniques. The time taken to display the map is essentially constant
+for both of the image-based techniques, regardless of the number of
+points to be plotted. This is not surprising given that the size of the
+generated data is also essentially constant, and that the browser is
+simply loading and displaying static images. The image overlay technique
+appears slightly slower than the image generation technique. This is
+probably because the image overlay technique has to load two images from
+the server (the base map and the overlay), compared to one image for the
+image generation technique.
 In contrast, the text-based technique clearly do not scale well with
 regards to map display time. Google Maps suffers particularly in this
 regard, with display time exceeding ten seconds shortly past 512 points.
 Testing was abandoned at 4,096 points, with a single test run taking
 over seven minutes. The HTML overlay technique fares better, exceeding
 ten seconds somewhere between 4,096 and 8,192 points. Testing was
 abandoned at 32,768 points, with a single test run taking almost ten
+minutes. Interestingly, Google Earth performed worse at the low end but
+did better at the high end, presumably because it is specifically
+designed to handle these kinds of tasks. We were able to reach 131,072
+points before testing was abandoned.
+minutes.
 \subsubsection{Combined time}
 \begin{figure}
 	\centering
+	\includegraphics[scale=0.66]{combined_time}
+	\includegraphics[scale=0.5]{combined_time}
 	\caption{Comparison of combined page load time for each technique (log-log scale).}
 	\label{fig-combined-time}
 \end{figure}
+Combining the data generation and map display times yields little change
+in the curves for the text-based techniques, because the data generation
+times are very small compared to the map display times. There is a more
+obvious impact on the image-based techniques, with both techniques
+remaining more or less constant up to about 2,048 points, then slowing
+as the number of points increases beyond that. However, the slowdown is
+nowhere near as dramatic as for the text-based techniques; even the
+largest data set only takes about nineteen seconds overall. The image
+overlay technique does display a slight advantage of about half a second
+over the image generation technique for the largest data set, but
+further experiments will be required to determine whether this is a
+statistically significant difference.
+Combining the data generation and map display times (see
+Figure~\ref{fig-combined-time}) yields little change in the curves for
+the text-based techniques, because the data generation times are very
+small compared to the map display times. There is a more obvious impact
+on the image-based techniques, with both techniques remaining more or
+less constant up to about 2,048 points, then slowing as the number of
+points increases beyond that. However, the slowdown is nowhere near as
+dramatic as for the text-based techniques; even the largest data set
+only takes about nineteen seconds overall. The image overlay technique
+does display a slight advantage of about half a second over the image
+generation technique for the largest data set, but further experiments
+will be required to determine whether this is a statistically
+significant difference.
 \subsection{Memory usage}
 memory usage in each category. This told us the size of both the current
 ``working set'' and the total memory footprint of the browser process
 after it had completed a test run. The real memory results are shown in
 Figure~\ref{fig-real-memory} and the virtual memory results are shown in
+Figure~\ref{fig-virtual-memory}
+\begin{figure}
+	\centering
+	\includegraphics[scale=0.66]{real_memory}
+Figure~\ref{fig-virtual-memory}.
+\begin{figure}
+	\centering
+	\includegraphics[scale=0.5]{real_memory}
 	\caption{Comparison of real memory usage for each technique (log-log scale).}
 	\label{fig-real-memory}
 \end{figure}
 \begin{figure}
 	\centering
+	\includegraphics[scale=0.66]{virtual_memory}
+	\includegraphics[scale=0.5]{virtual_memory}
 	\caption{Comparison of virtual memory usage for each technique (log-log scale).}
 	\label{fig-virtual-memory}
 \end{figure}
 While both sets of results display similar trends, the real memory data
+proved somewhat difficult to interpret. Real memory usage was generally
+consistent across test runs, but would also frequently fluctuate upwards
+by a factor of nearly two for no readily apparent reason. This is
+proved somewhat problematic. Real memory usage was generally consistent
+across test runs, but would also frequently fluctuate upwards by a
+factor of nearly two for no readily apparent reason. This is
 particularly apparent with the HTML overlay technique beyond 1,024
 points. We can only assume that this was a result of other processes on
 the test machine interacting with the browser process in unexpected
 ways. We are therefore somewhat wary of the real memory data, but they
 memory footprint of a process is less likely to be impacted by other
 running processes.
 The results show that the two image-based techniques have essentially
+constaint memory usage regardless of the number of points plotted. This
+constant memory usage regardless of the number of points plotted. This
 is to be expected, given that the size of the source data is also
 essentially constant. The text-based techniques, however, clearly begin
 to diverge as the number of points increases. The HTML overlay technique
 starts to visibly diverge somewhere between 2,048 and 4,096 points,
+while Google Maps starts to visbly diverge 64 and 128 points. This is in
+line with our expectation for these techniques that memory usage would
+increase in proportion to the number of points.
+\section{Conclusion}
+while Google Maps starts to visibly diverge 64 and 128 points. This is
+in line with our expectation for these techniques that memory usage
+would increase in proportion to the number of points. It is intriguing
+to note that for both techniques, there appears little consistency as to
+where the performance of each measure begins to diverge, as shown in
+Table~\ref{tab-divergence} (although Google Maps appears to exhibit
+greater consistency than HTML overlay in this regard).
+\begin{acmtable}{11cm}
+	\centering
+	\begin{tabular}{lccc}
+		Technique						&	Data size	&	Map display time	&	Virtual memory	\\
+		\hline
+		Server-side HTML overlay		&	64--128		&	128--256			&	2,048--4,096 \\
+		Google Maps						&	256--512	&	64--128				&	64--128	\\
+	\end{tabular}
+	\caption{Approximate number of points at which each measure begins to diverge,
+		for the HTML overlay and Google Maps techniques.}
+	\label{tab-divergence}
+\end{acmtable}
+\section{Conclusion and future work}
+\label{sec-conclusion}
 In this research, we tested the scalability of four techniques for
 online geovisualization of web site hits, with respect to the number of
 points to be plotted on the map. The four techniques tested were
 server-side image generation and server-side image overlay techniques
 scale the best from small to large data sets. The HTML overlay and
 Google Maps techniques work well for small data sets, but their
 performance rapidly deteriorates as the size of the data set increases,
+to the point where they are essentially unusable.
+Our aim was to identify which was the best technique to use to plot hits
+on the Otago School of Business repository. We are now left with a choice
+between two very similarly-performing techniques.
+% The
+% software extracts IP addresses from the web server logs, geolocates them
+% using the free MaxMind GeoLite Country database\footnote{See
+% \url{http://www.maxmind.com/app/ip-location}.}, then stores the
+% resulting country information in a separate database.
+% The Tasmania software, however, uses countries as its base unit of
+% aggregation. We were interested in looking at the distribution on a finer
+% level, down to individual cities if possible
+to the point where they become unusable.
+Despite this clear difference in scalability, we are still left with
+some interesting questions. We did not investigate the model interaction
+environment distribution style in this research, as it was unclear
+whether this could be achieved using only client-side JavaScript. This
+is clearly an avenue for further investigation. In addition, the
+appearance of native SVG support in some browsers means that this may
+also become a viable option in future.
+We were somewhat surprised that the server-side HTML overlay and Google
+Maps techniques exhibited no obvious consistency in where the different
+measures (data size, map display time and virtual memory usage)
+diverged. It seems logical that some form of correlation might exist, so
+further research will be required to investigate this. One possibility
+might be to implement an instrumented web browser and server in order to
+gather more precise data.
+Shortly after completing our experiments, we discovered \emph{msCross
+Webgis}\footnote{\url{http://datacrossing.crs4.it/en_Documentation_mscross.html}},
+which is an open source Google Maps clone. Its documentation implies
+that it may be possible to build a fully self-contained implementation
+that requires no external network access. This would enable us to test
+on an isolated network with the client and server running on different
+machines. We could then include measurements of network transfer time,
+and eliminate any problems caused by running the client and server on
+the same machine.
+Our overall aim was to identify which was the best technique to use to
+plot downloads and abstract views from the Otago School of Business
+digital repository. Based on our results, both the server-side HTML
+overlay and Google Maps techniques are clearly inappropriate for this
+task. This leaves us with a choice between two very similarly-performing
+techniques: server-side image generation and server-side image overlay.
+However, the practical advantages of multi-layer techniques over
+single-layer techniques, such as the ability to dynamically show and
+hide multiple overlays, mean that server-side image overlay is the clear
+winner in this case.
+\begin{acks}
+The author would like to acknowledge Dr.\ Antoni Moore and Prof.\ George
+Benwell for their input into this research.
+\end{acks}
 \bibliography{Map_Visualisation}
 \begin{received}
 ...
 \end{received}

                Ignore Space
               Show notes
              View
            
          
          14 ■■■■■
          lineplot.plo
 	xscaletype: log
	yscaletype: log
	xrange: 1 2000000
	yrange: @ymin @ymax
//	autowidth: 0.33
//	autoheight: 0.66
 
#proc xaxis:
	selflocatingstubs: text
		#include chunk_logstubs
	legendsampletype: line+symbol
 
#proc lineplot
	xfield: points
	yfield: html_overlay
	pointsymbol: shape=square linecolor=black fillcolor=white
	legendlabel: HTML overlay
	yfield: image_overlay
	pointsymbol: shape=square style=spokes linecolor=black
	legendlabel: Image overlay
	legendsampletype: line+symbol
 
#proc lineplot
	xfield: points
	yfield: image_overlay
	pointsymbol: shape=square style=spokes linecolor=black
	legendlabel: Image overlay
	yfield: html_overlay
	pointsymbol: shape=square linecolor=black fillcolor=white
	legendlabel: HTML overlay
	legendsampletype: line+symbol
 
#proc lineplot
	xfield: points

            #musthave infile
#musthave title
#musthave ytitle
#musthave ymin
#musthave ymax

#proc settings
	units: cm
#endproc

#proc getdata
	delim: comma
	fieldnameheader: yes
	file: @infile

#proc areadef
//	title: @title
//	titledetails: size=10 style=B align=C
	areaname: standard
	xscaletype: log
	yscaletype: log
	xrange: 1 2000000
	yrange: @ymin @ymax

#proc xaxis:
	selflocatingstubs: text
		#include chunk_logstubs
	stubdetails: size=10
	label: Number of points
	labeldetails: style=B size=10 adjust=0,-0.1
	ticlen: 0.25

#proc xaxis:
	axisline: none
	selflocatingstubs: text
		#include chunk_logtics
	ticlen: 0.1

#proc yaxis:
	selflocatingstubs: text
		#include chunk_logstubs
	stubformat: %2.1f
	stubdetails: size=10
	label: @ytitle
	labeldetails: style=B size=10 adjust=-0.3,0
	grid: color=gray(0.7)
	ticlen: 0.25

#proc yaxis:
	axisline: none
	selflocatingstubs: text
		#include chunk_logtics
	ticlen: 0.1

#proc lineplot
	xfield: points
	yfield: image_gen
	pointsymbol: shape=diamond linecolor=black fillcolor=white
	legendlabel: Image generation
	legendsampletype: line+symbol

#proc lineplot
	xfield: points
	yfield: image_overlay
	pointsymbol: shape=square style=spokes linecolor=black
	legendlabel: Image overlay
	legendsampletype: line+symbol

#proc lineplot
	xfield: points
	yfield: html_overlay
	pointsymbol: shape=square linecolor=black fillcolor=white
	legendlabel: HTML overlay
	legendsampletype: line+symbol

#proc lineplot
	xfield: points
	yfield: google_maps
	pointsymbol: shape=triangle linecolor=black fillcolor=white
	legendlabel: Google Maps
	legendsampletype: line+symbol

#proc legend
	format: multiline
	location: min+2 max-1
	textdetails: size=10
	seglen: 0.25
	frame: yes
	backcolor: white

            #musthave infile
#musthave title
#musthave ytitle
#musthave ymin
#musthave ymax

#proc settings
	units: cm
#endproc

#proc getdata
	delim: comma
	fieldnameheader: yes
	file: @infile

#proc areadef
//	title: @title
//	titledetails: size=10 style=B align=C
	areaname: standard
	xscaletype: log
	yscaletype: log
	xrange: 1 2000000
	yrange: @ymin @ymax
//	autowidth: 0.33
//	autoheight: 0.66

#proc xaxis:
	selflocatingstubs: text
		#include chunk_logstubs
	stubdetails: size=10
	label: Number of points
	labeldetails: style=B size=10 adjust=0,-0.1
	ticlen: 0.25

#proc xaxis:
	axisline: none
	selflocatingstubs: text
		#include chunk_logtics
	ticlen: 0.1

#proc yaxis:
	selflocatingstubs: text
		#include chunk_logstubs
	stubformat: %2.1f
	stubdetails: size=10
	label: @ytitle
	labeldetails: style=B size=10 adjust=-0.3,0
	grid: color=gray(0.7)
	ticlen: 0.25

#proc yaxis:
	axisline: none
	selflocatingstubs: text
		#include chunk_logtics
	ticlen: 0.1

#proc lineplot
	xfield: points
	yfield: image_gen
	pointsymbol: shape=diamond linecolor=black fillcolor=white
	legendlabel: Image generation
	legendsampletype: line+symbol

#proc lineplot
	xfield: points
	yfield: html_overlay
	pointsymbol: shape=square linecolor=black fillcolor=white
	legendlabel: HTML overlay
	legendsampletype: line+symbol

#proc lineplot
	xfield: points
	yfield: image_overlay
	pointsymbol: shape=square style=spokes linecolor=black
	legendlabel: Image overlay
	legendsampletype: line+symbol

#proc lineplot
	xfield: points
	yfield: google_maps
	pointsymbol: shape=triangle linecolor=black fillcolor=white
	legendlabel: Google Maps
	legendsampletype: line+symbol

#proc legend
	format: multiline
	location: min+2 max-1
	textdetails: size=10
	seglen: 0.25
	frame: yes
	backcolor: white
 	xscaletype: log
 	yscaletype: log
 	xrange: 1 2000000
 	yrange: @ymin @ymax
-//	autowidth: 0.33
-//	autoheight: 0.66
 #proc xaxis:
 	selflocatingstubs: text
 		#include chunk_logstubs
 	legendsampletype: line+symbol
 #proc lineplot
 	xfield: points
+	yfield: html_overlay
+	pointsymbol: shape=square linecolor=black fillcolor=white
+	legendlabel: HTML overlay
+	yfield: image_overlay
+	pointsymbol: shape=square style=spokes linecolor=black
+	legendlabel: Image overlay
 	legendsampletype: line+symbol
 #proc lineplot
 	xfield: points
+	yfield: image_overlay
+	pointsymbol: shape=square style=spokes linecolor=black
+	legendlabel: Image overlay
+	yfield: html_overlay
+	pointsymbol: shape=square linecolor=black fillcolor=white
+	legendlabel: HTML overlay
 	legendsampletype: line+symbol
 #proc lineplot
 	xfield: points

Show line notes below