diff --git a/Map_Visualisation.tex b/Map_Visualisation.tex index 862c15d..1869d8d 100755 --- a/Map_Visualisation.tex +++ b/Map_Visualisation.tex @@ -84,7 +84,7 @@ \begin{figure} \begin{center} - \includegraphics[scale=0.65]{tasmania-stats} + \includegraphics[scale=0.65]{tasmania_stats} \end{center} \caption{A portion of the by-country display for the Otago EPrints repository, generated by the Tasmania statistics software.} @@ -98,7 +98,8 @@ technique that could be used within a modern web browser without the need to manually install additional client software, thus providing us with the widest possible audience and reducing the impact of wide -variation in client hardware and software environments. +variation in client hardware and software environments \cite[pp.\ +27--28]{Offu-J-2002-quality}. There have been several prior efforts to plot web activity geographically. \citeN{Lamm-SE-1996-webvis} developed a sophisticated @@ -117,10 +118,10 @@ attempt to guess the country of origin, but these produced fairly crude results. Locations outside the United States were typically aggregated by country and mapped to the capital city -\cite{Lamm-SE-1996-webvis,Papa-N-1998-Palantir}. Reasonably accurate -databases were commercially available at the time \cite[p.\ -1466]{Lamm-SE-1996-webvis}, but were not available to the public at -large, thus limiting their utility. +\cite{Lamm-SE-1996-webvis,Papa-N-1998-Palantir,Jian-B-2000-cybermap}. +Reasonably accurate databases were commercially available at the time +\cite[p.\ 1466]{Lamm-SE-1996-webvis}, but were not available to the +public at large, thus limiting their utility. The situation has improved considerably in the last five years, however, with the advent of freely available and reasonably accurate geolocation @@ -144,12 +145,12 @@ it will be discussed further in Section~\ref{sec-imagegen}. However, there are alternative techniques that have become possible only -relatively recently, and are therefore unlikely to be in wide use (if at -all). One possible technique is to load a base image into the browser, -then overlay points onto the image using using absolutely positioned -HTML \verb|
| elements. This technique raises the potential for a -more GIS-like style of interaction with the map, with multiple layers -that can be activated and deactivated as necessary. We refer to this +relatively recently, and are therefore less likely to be in wide use. +One possible technique is to load a base image into the browser, then +overlay points onto the image using using absolutely positioned HTML +\verb|
| elements. This technique raises the potential for a more +GIS-like style of interaction with the map, with multiple layers that +can be activated and deactivated as necessary. We refer to this technique as \emph{HTML overlay}; it will be discussed further in Section~\ref{sec-overlay}. @@ -161,55 +162,18 @@ customisability to the developer. This technique will be discussed further in Section~\ref{sec-google}. -% This technique -% requires a browser that supports the Cascading Style Sheet (CSS) -% positioning properties; such support has only appeared relatively -% recently. -% -% The former technique, which we shall henceforth refer to as \emph{base -% map + HTML overlay}, involves overlaying points onto a base map image on -% the client side, using HTML \verb|
| elements that are absolutely -% positioned via CSS. The latter technique -% -% -% -% We therefore -% considered options that did not require additional client software byond what -% was provided by the web browser. We identified three -% possible techniques for generating such a map: - -% \begin{description} -% -% \item[Image generation] An image (e.g., a JPEG or PNG) is generated -% at the server by plotting points directly onto a base map. The final -% image is then sent to the client. -% -% \item[Base map + HTML overlay] A base map image is sent to the -% client. Points are then overlaid on this map at either the client or -% the server using HTML \verb|
| elements that are absolutely -% positioned via CSS. -% -% \item[Google Maps] The Google Maps API is used at the client to -% generate a base map and plot points on the map. The data for this -% map are generated at the server. -% -% \end{description} - -% We will describe these techniques in more detail in -% Section~\ref{sec-techniques}. The first technique (image generation) -% appears to be fairly widespread and has been in use for some time, -% whereas the latter two do not appear to have been widely used (we will -% examine possible reasons for this shortly). - The identification of these three techniques immediately raised the -question of which was the best for our purposes. The greatest concern -was whether these techniques could scale to a large number of points. -For example, at the time of writing the Otago EPrints repository had -been accessed from over 10,000 distinct IP addresses, each potentially -representing a distinct geographical location. Taking into consideration -the type of hit (abstract view versus document download) increased that -figure to nearly 13,000. Ideally we wanted a technique that could plot a -large number of points as quickly as possible. +question of which was the best for our purposes. Of greatest concern was +whether these techniques could scale to a large number of points, as +scalability is a key issue for web applications in general \cite[p.\ +28]{Offu-J-2002-quality}, and online activity visualization in +particular \cite[p.\ 50]{Eick-SG-2001-sitevis}. For example, at the time +of writing the Otago EPrints repository had been accessed from over +10,000 distinct IP addresses, each potentially representing a distinct +geographical location. Taking into consideration the type of hit +(abstract view versus document download) increased that figure to nearly +13,000. Ideally we wanted a technique that could plot a large number of +points as quickly as possible. We therefore set about testing the scalability of the three techniques to determine how well each technique handled large numbers of points. A @@ -255,7 +219,7 @@ \begin{center} \includegraphics[width=0.95\textwidth,keepaspectratio]{gd_map} \end{center} - \caption{Example output from the image generation technique.} + \caption{Sample output from the image generation technique.} \label{fig-image} \end{figure} @@ -308,6 +272,13 @@ \subsection{HTML overlay} \label{sec-overlay} +% Look for publications regarding the DataCrossing Ajax client. +% See . +% They use rather than
, which has the advantage of the image +% being loaded only once, but makes it harder to dynamically change the +% appearance of markers. The amount of data generated will still be +% proportional to the number of points (one per point). + This technique also involves plotting points onto a base map image, but it differs from the image generation technique in that the points are not plotted directly onto the base map image. Rather, the points are @@ -383,40 +354,39 @@ This technique uses the client-side Google Maps API \cite{Goog-M-2006-maps} to both generate the base map and plot points on -it, as shown in Figure~\ref{fig-google}. The output and interaction is -therefore significantly different in nature from that provided by the -other two techniques. Google Maps requires JavaScript support at the -client and the Google Maps software must be installed on the client. -However, since the latter happens automatically when the corresponding -web page is loaded, this technique meets our requirements. +it; an example of the output is shown in Figure~\ref{fig-google}. The +output and interaction is significantly different in nature from that +provided by the other two techniques. Google Maps requires JavaScript +support at the client and the Google Maps software must be installed on +the client. However, since the latter happens automatically when the +corresponding web page is loaded, this technique meets our requirements. +Google Maps inherently uses a distributed architecture, as shown in +Figure~\ref{fig-google-architecture}. Data are generated at the server, +while all map display and manipulation occurs at the client. \begin{figure} \begin{center} \includegraphics[width=0.95\textwidth,keepaspectratio]{google_map} \end{center} - \caption{Example output from the Google Maps technique.} + \caption{Sample output from the Google Maps technique.} \label{fig-google} \end{figure} -Google Maps by definition uses a distributed architecture, as shown in -Figure~\ref{fig-google-architecture}. Data are generated at the server, -while all map display and manipulation occurs at the client. - - \begin{figure} \caption{Distributed architecture of the Google Maps technique.} \label{fig-google-architecture} \end{figure} -The primary advantage of this technique is that it provides an appealing -visual display and powerful functionality for interacting with the map. -Users may pan the map in any direction and zoom in and out to many -different levels. A satellite imagery view is also available. In -addition, further information about each point plotted (such as the name -of the city, for example) can be displayed in a ``speech bubble'' next -to the point, as shown in Figure~\ref{fig-google}. + +The primary advantage of this technique is the powerful functionality it +provides for generating and interacting with the map. Users may pan the +map in any direction and zoom in and out to many different levels. A +satellite imagery view is also available. In addition, further +information about each point plotted (such as the name of the city, for +example) can be displayed in a ``speech bubble'' next to the point, as +shown in Figure~\ref{fig-google}. The display is also visually appealing. However, there are also some significant disadvantages compared to the previous two techniques. As a distrbiuted applicatiopn, it is more @@ -432,16 +402,92 @@ Google will implement this feature in a later version of the API). Interestingly, the Google Earth application addresses several of these -issues, but this is clearly outside the scope of our work, as it -requires the manual installation of extra software and runs outside the -web browser entirely. (Just for fun, however, we will do an informal -comparison in Section~\ref{sec-results} between Google Earth and the -three techniques discussed here.) +issues, but falls outside the scope of this work, as it requires the +manual installation of extra software and runs outside the web browser +entirely. (Just for fun, however, we will do an informal comparison in +Section~\ref{sec-results} between Google Earth and the three techniques +discussed here.) \section{Experimental design} \label{sec-experiment} +After some preliminary experimentation and testing with live data from +the Otago School of Business repository, we proceeded with a more formal +series of experiments to test the scalability of the three techniques. +Each technique was tested using progressively larger sets of synthetic +data. The first data set comprised one point at the South Pole (latitude +\(-90^{\circ}\), longitude \(0^{\circ}\)). Each successive data set was twice +the size of its preceecssor, and comprised a regular grid of +latitude/longitude points at one degree intervals. A total of twenty-one +data sets were created in this way, with the number of points ranging +from one to 1,048,576 (\(=2^{20}\)). + +Beacuse of the focus on scalability, we were primarily interested in +measuring page load times, memory usage, and the amount of data +generated (which impacts on both storage and network bandwidth). The +page load time can be further broken down into the time taken to +generate the map data, the time taken to transfer the map data to the +client across the network, and the time taken by the client to display +the map. + +Unfortunately, the Google Maps technique requires an active Internet +connection, so we were unable to run the experiments on an isolated +network. This meant that traffic on the local network could be a +confounding factor. We therefore decided to eliminate network +performance from the equation by running both the server and the client +on the same machine\footnote{A Power Macintosh G5 1.8\,MHz with 1\,GiB +RAM, running Mac OS X 10.4.7, Apache 2.0.55, PHP 4.4 and Perl 5.8.6.}. +This in turn enabled us to measure the time taken for data generation +and page display independently of each other, thus simplifying the +process of data collection and also reducing the impact that the client +and server processes would have on each other. + +It could be argued that network performance would still have a +confounding effect on the Google Maps technique, but this would only be +likely for the intial download of the API (which comprises about +155\,KiB of JavaScript source), as the API will be locally cached +thereafter. The API key verification occurs every time the map is +loaded, but the amount of data involved is very small, so it seems +unlikely that this would be significantly affected by network +performance. + +For each data set, we recorded the size of the data set, the time taken +to generate it, the time taken to display the resultant map in the +browser, and the amount of memory used during the test by both the +browser and the web server. The data set generation time and memory +usage were measured using the \texttt{time} and \texttt{top} utilities +respectively. The map display time was measured using the ``page load +test'' debugging feature of Apple's Safari web browser, which can +repetitively load a set of pages while recording various statistics, in +particular the time taken to load the page. Tests were run up to twenty +times each, where feasible, in order to reduce the impact of random +variations. + +%%!! confused! + +The image generation technique was implemented as a server-side +architecture. A dispatcher page written in PHP called a Perl script, +which generated a JPEG-compressed map image and returned this to the +browser. + +The HTML overlay technique was implemented in two ways: +\begin{itemize} + + \item as a server-side architecture that worked in much the same way + as the image generation technique, except that the Perl script + returned an HTML file containing the \verb|
| elements for the + overlay, and an \verb|| element to load the base map image; and + + \item as a distributed architecture, where client-side JavaScript + code made an asynchronous call to the server-side Perl script, which + returned + +and Google Maps techniques were implemented as a server-side +and a distributed architecture respectively. The HTML overlay technique +was implemented twice; once as a server-side architecture and once as a +distributed architecture. + \section{Results} \label{sec-results}