GitBucket
4.21.2
Toggle navigation
Snippets
Sign in
Files
Branches
12
Releases
Issues
Pull requests
Labels
Priorities
Milestones
Wiki
Forks
nigel.stanger
/
Publications
Browse code
- Added CIA World Factbook.
- Revamped details of how the techniques were implemented. - Restored real memory chart.
TOIT_2006
1 parent
f240247
commit
d68e502fdb852bcc9d4f764f334ff117432da8b8
nstanger
authored
on 7 Aug 2006
Patch
Showing
2 changed files
Map_Visualisation.bib
Map_Visualisation.tex
Ignore Space
Show notes
View
Map_Visualisation.bib
This file was created with JabRef 2.1 beta. Encoding: ISO8859_1 @STRING{acj = {Australian Computer Journal}} @STRING{adt = {Application Development Trends}} @STRING{ai = {Artificial Intelligence}} @STRING{ajis = {Australian Journal of Information Systems}} @STRING{cacm = {Commun.\ ACM}} @STRING{cj = {Comput.\ J.}} @STRING{database = {ACM SIGMIS Database}} @STRING{dbms = {DBMS Magazine}} @STRING{dbpd = {Database Programming {\&} Design}} @STRING{develop = {{d}evelop, The Apple Technical Journal}} @STRING{directions = {Apple Directions}} @STRING{dke = {Data {\&} Knowledge Engineering}} @STRING{ejis = {European Journal of Information Systems}} @STRING{idt = {Internet Development Trends}} @STRING{ieees = {IEEE Softw.}} @STRING{ijast = {International Journal of Applied Software Technology}} @STRING{ijgis = {International Journal of Geographical Information Systems}} @STRING{ijhcs = {International Journal of Human-Computer Studies}} @STRING{ijmms = {International Journal of Man-Machine Studies}} @STRING{ijseke = {International Journal of Software Engineering and Knowledge Engineering}} @STRING{is = {Information Systems}} @STRING{isj = {Information Systems Journal}} @STRING{ist = {Inf.\ Softw.\ Tech.}} @STRING{jacm = {Journal of the ACM}} @STRING{jlp = {Journal of Logic Programming}} @STRING{jot = {Journal of Object Technology}} @STRING{jss = {J.\ Syst.\ Softw.}} @STRING{lncs = {Lecture Notes in Computer Science}} @STRING{misq = {MIS Quarterly}} @STRING{nzjc = {New Zealand Journal of Computing}} @STRING{nzjis = {New Zealand Journal of Information Systems}} @STRING{oracle = {Oracle Magazine}} @STRING{reg = {\scriptsize\textsuperscript{\textregistered}}} @STRING{sej = {Software Engineering Journal}} @STRING{sigmod = {ACM SIGMOD Record}} @STRING{spe = {Software---Practice and Experience}} @STRING{surveys = {ACM Comput.\ Surv.}} @STRING{tkde = {IEEE Trans.\ Knowl.\ Data Eng.}} @STRING{tm = {\scriptsize\texttrademark}} @STRING{tods = {ACM Trans.\ Database Syst.}} @STRING{toit = {ACM Trans.\ Internet Tech.}} @STRING{tose = {IEEE Trans.\ Softw.\ Eng.}} @STRING{tweb = {ACM Trans.\ Web}} @STRING{vldb = {The VLDB Journal}} @INCOLLECTION{Beau-JR-1991-GIS, author = {J.\ R.\ Beaumont}, title = {{GIS} and market analysis}, booktitle = {Geographical Information Systems, Volume 2: Applications}, publisher = {Longman}, year = {1991}, editor = {David J.\ Maguire and Michael F.\ Goodkind and David W.\ Rhind}, chapter = {45}, pages = {139--151}, address = {Harlow, UK}, timestamp = {2006.07.11} } @BOOK{Dodg-M-2001-cybermap, title = {Mapping cyberspace}, publisher = {Routledge}, year = {2001}, author = {Martin Dodge and Rob Kitchin}, address = {London, UK}, timestamp = {2006.07.04} } @MISC{IP2L-C-2006-GeoIP, author = {IP2Location}, title = {{IP2L}ocation} # tm # {: {B}ringing geography to the {I}nternet}, howpublished = {\url{http://www.ip2location.com/}, IP2Location.com}, month = jul, year = {2006}, note = {Accessed on 18 July 2006.}, timestamp = {2006.07.18}, url = {http://www.ip2location.com/} } @ARTICLE{Lamm-SE-1996-webvis, author = {Stephen E.\ Lamm and Daniel A.\ Reed and Will H.\ Scullin}, title = {Real-time geographic visualization of {W}orld {W}ide {W}eb traffic}, journal = {Computer Networks and ISDN Systems}, year = {1996}, volume = {28}, pages = {1457--1468}, number = {7--11}, month = may, doi = {doi:10.1016/0169-7552(96)00055-4}, pdf = {L/Lamm-SE-1996-webvis.pdf}, timestamp = {2006.07.18} } @MISC{Maxm-G-2006-GeoIP, author = {Maxmind}, title = {Geo{IP}: {IP} address location technology}, howpublished = {\url{http://www.maxmind.com/app/ip-location}, Maxmind LLC}, month = jul, year = {2006}, note = {Accessed on 18 July 2006.}, timestamp = {2006.07.18}, url = {http://www.maxmind.com/app/ip-location} } @MISC{Maxm-G-2006-GeoLiteCity, author = {Maxmind}, title = {Geo{L}ite {C}ity: Free {IP} address to city database}, howpublished = {\url{http://www.maxmind.com/app/geolitecity}, Maxmind LLC}, month = jul, year = {2006}, note = {Accessed on 18 July 2006.}, timestamp = {2006.07.18}, url = {http://www.maxmind.com/app/geolitecity} } @INPROCEEDINGS{Papa-N-1998-Palantir, author = {Nektarios Papadakakis and Evangelos P.\ Markatos and Athanasios E.\ Papathanasiou}, title = {Palantir: {A} visualization tool for the {W}orld {W}ide {W}eb}, booktitle = {Proceedings of the INET'98 Conference}, year = {1998}, address = {Geneva, Switzerland}, month = {21--24~} # jul, citeseerurl = {http://citeseer.ist.psu.edu/68932.html}, pdf = {P/Papa-N-1998-Palantir.pdf}, timestamp = {2006.07.18}, url = {http://www.isoc.org/inet98/proceedings/1e/1e_1.htm} } @MISC{Sale-A-2006-stats, author = {Arthur Sale and Christian McGee}, title = {Tasmania {S}tatistics {S}oftware}, howpublished = {\url{http://eprints.comp.utas.edu.au:81/archive/00000262/}, University of Tasmania, Hobart, Australia}, month = {12~} # feb, year = {2006}, note = {Accessed on 7 March 2006.}, timestamp = {2006.07.11}, url = {http://eprints.comp.utas.edu.au:81/archive/00000262/} } @TECHREPORT{Stan-N-2006-running, author = {Nigel Stanger and Graham McGregor}, title = {Hitting the ground running: {B}uilding {N}ew {Z}ealand's first publicly available institutional repository}, institution = {Department of Information Science, University of Otago}, year = {2006}, type = {Discussion Paper}, number = {2006/07}, address = {Dunedin, New Zealand}, month = mar, pdf = {S/Stan-N-2006-running.pdf}, timestamp = {2006.07.11}, url = {http://www.business.otago.ac.nz/infosci/pubs/papers/papers/dp2006-07.pdf} } @ARTICLE{MacE-AM-1998-GIS, author = {Alan M.\ MacEachren}, title = {Cartography, {GIS} and the {W}orld {W}ide {W}eb}, journal = {Progress in Human Geography}, year = {1998}, volume = {22}, pages = {575--585}, number = {4}, month = dec, doi = {doi:10.1191/030913298670626440}, pdf = {MacE-AM-1998-GIS.pdf}, timestamp = {2006.07.18} } @MISC{Goog-M-2006-maps, author = {Google}, title = {Google {M}aps {API}}, howpublished = {\url{http://maps.google.com/apis/maps/}}, year = {2006}, note = {Accessed on 18 July 2006.}, timestamp = {2006.07.18}, url = {http://maps.google.com/apis/maps/} } @MISC{Bout-T-2004-GD, author = {Thomas Boutell}, title = {{GD} {G}raphics {L}ibrary}, howpublished = {\url{http://www.boutell.com/gd/}}, year = {2004}, note = {Updated on 3 November 2004; accessed on 18 July 2006.}, timestamp = {2006.07.18}, url = {http://www.boutell.com/gd/} } @ARTICLE{Offu-J-2002-quality, author = {Jeff Offutt}, title = {Quality attributes of {W}eb software applications}, journal = ieees, year = {2002}, volume = {19}, pages = {25--32}, number = {2}, month = mar # {/} # apr, doi = {doi:10.1109/52.991329}, pdf = {O/Offu-J-2002-quality.pdf}, timestamp = {2006.07.04} } @ARTICLE{Jian-B-2000-cybermap, author = {Bin Jiang and Ferjan Ormeling}, title = {Mapping cyberspace: {V}isualizing, analysing and exploring virtual worlds}, journal = {The Cartographic Journal}, year = {2000}, volume = {37}, pages = {117--122}, number = {2}, month = dec, pdf = {J/Jian-B-2000-cybermap.pdf}, timestamp = {2006.07.04}, url = {http://www.hig.se/~bjg/cybermap2000.pdf} } @ARTICLE{Eick-SG-2001-sitevis, author = {Stephen G.\ Eick}, title = {Visualizing online activity}, journal = cacm, year = {2001}, volume = {44}, pages = {45--50}, number = {8}, month = aug, doi = {http://doi.acm.org/10.1145/381641.381710}, pdf = {E/Eick-SG-2001-sitevis.pdf}, timestamp = {2006.07.04} } @INPROCEEDINGS{Wood-J-1996-vis, author = {Jason Wood and Ken Brodlie and Helen Wright}, title = {Visualization over the {W}orld {W}ide {W}eb and its application to environmental data}, booktitle = {Proceedings of IEEE Visualization '96}, year = {1996}, editor = {Roni Yagel and Gregory M.\ Nielson}, pages = {81--86}, address = {San Francisco, California}, month = oct # {~27--} # nov # {~1}, organization = {IEEE Computer Society and ACM}, pdf = {W/Wood-J-1996-vis.pdf}, timestamp = {2006.07.24}, url = {http://portal.acm.org/citation.cfm?id=245010&dl=ACM&coll=GUIDE&CFID=15151515&CFTOKEN=6184618} } @MISC{CIA-WFB-2006, author = {{CIA}}, title = {The {W}orld {F}actbook}, howpublished = {\url{https://www.cia.gov/cia/publications/factbook/}, Central Intelligence Agency, Washington, DC, USA}, year = {2006}, timestamp = {2006.08.07}, url = {https://www.cia.gov/cia/publications/factbook/} } @comment{jabref-meta: selector_journal:} @comment{jabref-meta: selector_author:} @comment{jabref-meta: selector_keywords:} @comment{jabref-meta: selector_publisher:}
This file was created with JabRef 2.1 beta. Encoding: ISO8859_1 @STRING{acj = {Australian Computer Journal}} @STRING{adt = {Application Development Trends}} @STRING{ai = {Artificial Intelligence}} @STRING{ajis = {Australian Journal of Information Systems}} @STRING{cacm = {Commun.\ ACM}} @STRING{cj = {Comput.\ J.}} @STRING{database = {ACM SIGMIS Database}} @STRING{dbms = {DBMS Magazine}} @STRING{dbpd = {Database Programming {\&} Design}} @STRING{develop = {{d}evelop, The Apple Technical Journal}} @STRING{directions = {Apple Directions}} @STRING{dke = {Data {\&} Knowledge Engineering}} @STRING{ejis = {European Journal of Information Systems}} @STRING{idt = {Internet Development Trends}} @STRING{ieees = {IEEE Softw.}} @STRING{ijast = {International Journal of Applied Software Technology}} @STRING{ijgis = {International Journal of Geographical Information Systems}} @STRING{ijhcs = {International Journal of Human-Computer Studies}} @STRING{ijmms = {International Journal of Man-Machine Studies}} @STRING{ijseke = {International Journal of Software Engineering and Knowledge Engineering}} @STRING{is = {Information Systems}} @STRING{isj = {Information Systems Journal}} @STRING{ist = {Inf.\ Softw.\ Tech.}} @STRING{jacm = {Journal of the ACM}} @STRING{jlp = {Journal of Logic Programming}} @STRING{jot = {Journal of Object Technology}} @STRING{jss = {J.\ Syst.\ Softw.}} @STRING{lncs = {Lecture Notes in Computer Science}} @STRING{misq = {MIS Quarterly}} @STRING{nzjc = {New Zealand Journal of Computing}} @STRING{nzjis = {New Zealand Journal of Information Systems}} @STRING{oracle = {Oracle Magazine}} @STRING{reg = {\scriptsize\textsuperscript{\textregistered}}} @STRING{sej = {Software Engineering Journal}} @STRING{sigmod = {ACM SIGMOD Record}} @STRING{spe = {Software---Practice and Experience}} @STRING{surveys = {ACM Comput.\ Surv.}} @STRING{tkde = {IEEE Trans.\ Knowl.\ Data Eng.}} @STRING{tm = {\scriptsize\texttrademark}} @STRING{tods = {ACM Trans.\ Database Syst.}} @STRING{toit = {ACM Trans.\ Internet Tech.}} @STRING{tose = {IEEE Trans.\ Softw.\ Eng.}} @STRING{tweb = {ACM Trans.\ Web}} @STRING{vldb = {The VLDB Journal}} @INCOLLECTION{Beau-JR-1991-GIS, author = {J.\ R.\ Beaumont}, title = {{GIS} and market analysis}, booktitle = {Geographical Information Systems, Volume 2: Applications}, publisher = {Longman}, year = {1991}, editor = {David J.\ Maguire and Michael F.\ Goodkind and David W.\ Rhind}, chapter = {45}, pages = {139--151}, address = {Harlow, UK}, timestamp = {2006.07.11} } @BOOK{Dodg-M-2001-cybermap, title = {Mapping cyberspace}, publisher = {Routledge}, year = {2001}, author = {Martin Dodge and Rob Kitchin}, address = {London, UK}, timestamp = {2006.07.04} } @MISC{IP2L-C-2006-GeoIP, author = {IP2Location}, title = {{IP2L}ocation} # tm # {: {B}ringing geography to the {I}nternet}, howpublished = {\url{http://www.ip2location.com/}, IP2Location.com}, month = jul, year = {2006}, note = {Accessed on 18 July 2006.}, timestamp = {2006.07.18}, url = {http://www.ip2location.com/} } @ARTICLE{Lamm-SE-1996-webvis, author = {Stephen E.\ Lamm and Daniel A.\ Reed and Will H.\ Scullin}, title = {Real-time geographic visualization of {W}orld {W}ide {W}eb traffic}, journal = {Computer Networks and ISDN Systems}, year = {1996}, volume = {28}, pages = {1457--1468}, number = {7--11}, month = may, doi = {doi:10.1016/0169-7552(96)00055-4}, pdf = {L/Lamm-SE-1996-webvis.pdf}, timestamp = {2006.07.18} } @MISC{Maxm-G-2006-GeoIP, author = {Maxmind}, title = {Geo{IP}: {IP} address location technology}, howpublished = {\url{http://www.maxmind.com/app/ip-location}, Maxmind LLC}, month = jul, year = {2006}, note = {Accessed on 18 July 2006.}, timestamp = {2006.07.18}, url = {http://www.maxmind.com/app/ip-location} } @MISC{Maxm-G-2006-GeoLiteCity, author = {Maxmind}, title = {Geo{L}ite {C}ity: Free {IP} address to city database}, howpublished = {\url{http://www.maxmind.com/app/geolitecity}, Maxmind LLC}, month = jul, year = {2006}, note = {Accessed on 18 July 2006.}, timestamp = {2006.07.18}, url = {http://www.maxmind.com/app/geolitecity} } @INPROCEEDINGS{Papa-N-1998-Palantir, author = {Nektarios Papadakakis and Evangelos P.\ Markatos and Athanasios E.\ Papathanasiou}, title = {Palantir: {A} visualization tool for the {W}orld {W}ide {W}eb}, booktitle = {Proceedings of the INET'98 Conference}, year = {1998}, address = {Geneva, Switzerland}, month = {21--24~} # jul, citeseerurl = {http://citeseer.ist.psu.edu/68932.html}, pdf = {P/Papa-N-1998-Palantir.pdf}, timestamp = {2006.07.18}, url = {http://www.isoc.org/inet98/proceedings/1e/1e_1.htm} } @MISC{Sale-A-2006-stats, author = {Arthur Sale and Christian McGee}, title = {Tasmania {S}tatistics {S}oftware}, howpublished = {\url{http://eprints.comp.utas.edu.au:81/archive/00000262/}, University of Tasmania, Hobart, Australia}, month = {12~} # feb, year = {2006}, note = {Accessed on 7 March 2006.}, timestamp = {2006.07.11}, url = {http://eprints.comp.utas.edu.au:81/archive/00000262/} } @TECHREPORT{Stan-N-2006-running, author = {Nigel Stanger and Graham McGregor}, title = {Hitting the ground running: {B}uilding {N}ew {Z}ealand's first publicly available institutional repository}, institution = {Department of Information Science, University of Otago}, year = {2006}, type = {Discussion Paper}, number = {2006/07}, address = {Dunedin, New Zealand}, month = mar, pdf = {S/Stan-N-2006-running.pdf}, timestamp = {2006.07.11}, url = {http://www.business.otago.ac.nz/infosci/pubs/papers/papers/dp2006-07.pdf} } @ARTICLE{MacE-AM-1998-GIS, author = {Alan M.\ MacEachren}, title = {Cartography, {GIS} and the {W}orld {W}ide {W}eb}, journal = {Progress in Human Geography}, year = {1998}, volume = {22}, pages = {575--585}, number = {4}, month = dec, doi = {doi:10.1191/030913298670626440}, pdf = {MacE-AM-1998-GIS.pdf}, timestamp = {2006.07.18} } @MISC{Goog-M-2006-maps, author = {Google}, title = {Google {M}aps {API}}, howpublished = {\url{http://maps.google.com/apis/maps/}}, year = {2006}, note = {Accessed on 18 July 2006.}, timestamp = {2006.07.18}, url = {http://maps.google.com/apis/maps/} } @MISC{Bout-T-2004-GD, author = {Thomas Boutell}, title = {{GD} {G}raphics {L}ibrary}, howpublished = {\url{http://www.boutell.com/gd/}}, year = {2004}, note = {Updated on 3 November 2004; accessed on 18 July 2006.}, timestamp = {2006.07.18}, url = {http://www.boutell.com/gd/} } @ARTICLE{Offu-J-2002-quality, author = {Jeff Offutt}, title = {Quality attributes of {W}eb software applications}, journal = ieees, year = {2002}, volume = {19}, pages = {25--32}, number = {2}, month = mar # {/} # apr, doi = {doi:10.1109/52.991329}, pdf = {O/Offu-J-2002-quality.pdf}, timestamp = {2006.07.04} } @ARTICLE{Jian-B-2000-cybermap, author = {Bin Jiang and Ferjan Ormeling}, title = {Mapping cyberspace: {V}isualizing, analysing and exploring virtual worlds}, journal = {The Cartographic Journal}, year = {2000}, volume = {37}, pages = {117--122}, number = {2}, month = dec, pdf = {J/Jian-B-2000-cybermap.pdf}, timestamp = {2006.07.04}, url = {http://www.hig.se/~bjg/cybermap2000.pdf} } @ARTICLE{Eick-SG-2001-sitevis, author = {Stephen G.\ Eick}, title = {Visualizing online activity}, journal = cacm, year = {2001}, volume = {44}, pages = {45--50}, number = {8}, month = aug, doi = {http://doi.acm.org/10.1145/381641.381710}, pdf = {E/Eick-SG-2001-sitevis.pdf}, timestamp = {2006.07.04} } @INPROCEEDINGS{Wood-J-1996-vis, author = {Jason Wood and Ken Brodlie and Helen Wright}, title = {Visualization over the {W}orld {W}ide {W}eb and its application to environmental data}, booktitle = {Proceedings of IEEE Visualization '96}, year = {1996}, editor = {Roni Yagel and Gregory M.\ Nielson}, pages = {81--86}, address = {San Francisco, California}, month = oct # {~27--} # nov # {~1}, organization = {IEEE Computer Society and ACM}, pdf = {W/Wood-J-1996-vis.pdf}, timestamp = {2006.07.24}, url = {http://portal.acm.org/citation.cfm?id=245010&dl=ACM&coll=GUIDE&CFID=15151515&CFTOKEN=6184618} } @comment{jabref-meta: selector_journal:} @comment{jabref-meta: selector_author:} @comment{jabref-meta: selector_keywords:} @comment{jabref-meta: selector_publisher:}
Ignore Space
Show notes
View
Map_Visualisation.tex
\documentclass[acmtocl,acmnow]{acmtrans2m} \usepackage{graphicx} \newtheorem{theorem}{Theorem}[section] \newtheorem{conjecture}[theorem]{Conjecture} \newtheorem{corollary}[theorem]{Corollary} \newtheorem{proposition}[theorem]{Proposition} \newtheorem{lemma}[theorem]{Lemma} \newdef{definition}[theorem]{Definition} \newdef{remark}[theorem]{Remark} \markboth{Nigel Stanger}{...} \title{Scalability of Techniques for Online Geovisualization of Web Site Hits} \author{NIGEL STANGER \\ University of Otago} \begin{abstract} A useful approach to visualising the geographical distribution of web site hits is to geolocate the IP addresses of hits and plot them on a world map. This can be achieved by dynamic generation and display of map images at the server and/or the client. In this paper we compare the scalability with respect to source data size of four techniques for dynamic map generation and display: generating a single composite map image, overlaying transparent images on an underlying base map, overlaying CSS-enabled HTML on an underlying base map and generating a map using Google Maps. These four techniques embody a mixture of different display technologies and distribution styles. The results show that all four techniques are suitable for small data sets, but that the latter two techniques scale poorly to larger data sets. \end{abstract} \category{C.4}{Performance of Systems}{Performance attributes} \category{C.2.4}{Computer-Communication Networks}{Distributed Systems}[distributed applications] \category{H.3.5}{Information Storage and Retrieval}{Online Information Services}[web-based services] \terms{Experimentation, Measurement, Performance} \keywords{geolocation, geovisualization, scalability, GD, Google Maps} \begin{document} \bibliographystyle{acmtrans} \begin{bottomstuff} Author's address: N. Stanger, Department of Information Science, University of Otago, PO Box 56, Dunedin 9054, New Zealand. \end{bottomstuff} \maketitle \section{Introduction} \label{sec-introduction} When administering a web site, it is quite reasonable to want information on the nature of traffic to the site. Information on the geographic sources of traffic can be particularly useful in the right context. For example, an e-commerce site might wish to determine the geographical distribution of visitors to its site, so that it can decide where best to target its marketing resources. One approach to doing so is to plot the geographical location of web site hits on a map. Geographical information systems (GIS) were already being used for these kinds of purposes prior to the advent of the World Wide Web \cite{Beau-JR-1991-GIS}, and it is a natural extension to apply these ideas to online visualization of web site hits. Our interest in this area derives from implementing a pilot digital institutional repository at the University of Otago\footnote{\url{http://eprints.otago.ac.nz/}} in November 2005 \cite{Stan-N-2006-running}, using the GNU EPrints\footnote{\url{http://www.eprints.org/}} repository management software. This repository quickly attracted interest from around the world and the number of abstract views and document downloads began to steadily increase. We were obviously very interested in tracking this increase, particularly with respect to where in the world the hits were coming from. The EPrints statistics management software developed at the University of Tasmania \cite{Sale-A-2006-stats} proved very useful in this regard, providing us with detailed per-eprint and per-country download statistics; an example of the latter is shown in Figure~\ref{fig-tas-stats}. However, while this display provides an ordered ranking of the number of hits from each country, it does not provide any greater detail than to the country level, nor does it provide any visual clues as to the distribution of hit sources around the globe. \begin{figure} \begin{center} \includegraphics[scale=0.65]{tasmania_stats} \end{center} \caption{A portion of the by-country display for the Otago EPrints repository, generated by the Tasmania statistics software.} \label{fig-tas-stats} \end{figure} We therefore began to explore possible techniques for plotting our repository hit data onto a world map, with the aim of adding this capability to the Tasmania statistics package. Our preference was for a technique that could be used within a modern web browser without the need to manually install additional client software, so as to make the new feature available to the widest possible audience and reduce the impact of wide variation in client hardware and software environments \cite[pp.\ 27--28]{Offu-J-2002-quality}. There have been several prior efforts to geovisualize web activity. \citeN{Lamm-SE-1996-webvis} developed a sophisticated system for real-time visualization of web traffic on a 3D globe, but this was intended for use within a virtual reality environment, thus limiting its general applicability. \citeN{Papa-N-1998-Palantir} described a similar system (Palantir), which was written as a Java applet and thus able to be run within a web browser, assuming that a Java virtual machine was available. \citeN[pp.\ 100--103]{Dodg-M-2001-cybermap} describe these and several other related systems for mapping Web and Internet traffic. These early systems suffered from a distinct limitation in that there was no public infrastructure in place for geolocating IP addresses (that is, translating them into latitude/longitude coordinates). They generally used \texttt{whois} lookups or parsed the domain name in an attempt to guess the country of origin, with fairly crude results \cite{Lamm-SE-1996-webvis}. Locations outside the United States were typically aggregated by country and mapped to the capital city \cite{Lamm-SE-1996-webvis,Papa-N-1998-Palantir,Jian-B-2000-cybermap}. Reasonably accurate and detailed databases were commercially available at the time \cite[p.\ 1466]{Lamm-SE-1996-webvis}, but were not generally available to the public at large, thus limiting their utility. The situation has improved considerably in the last five years, however, with the advent of freely available and reasonably accurate geolocation services\footnote{Such as \url{http://www.maxmind.com/} or \url{http://www.ip2location.com/}.} with worldwide coverage and city-level resolution. For example, Maxmind's \emph{GeoLite City} database is freely available and claims to provide ``60\% accuracy on a city level for the US within a 25 mile radius'' \cite{Maxm-G-2006-GeoLiteCity}. Their commercial \emph{GeoIP City} database claims 80\% accuracy for the same parameters. The techniques used by these systems can generally be divided into two classes. The first class of techniques generate a single bitmap image that contains both the map and the icons representing web hits. This can be achieved by programmatically plotting points onto a base map image; the composite image is displayed at the client. We shall henceforth refer to this class of techniques as \emph{image generation} techniques. The second class of techniques separately return both a base map image and some kind of overlay containing the plotted points. The overlay is then combined with the base map at the client. We shall henceforth refer to this class of techniques as \emph{overlay} techniques. Both classes of techniques have been used in the aforementioned systems, but the overlay technique appears to have been particularly popular. For example, Palantir used an overlay technique, where a Java applet running at the client overlaid graphic elements onto a base map image retrieved from the now-defunct Xerox online map server \cite{Papa-N-1998-Palantir}. A more recent example is the Google Maps API \cite{Goog-M-2006-maps}, which enables web developers to easily embed dynamic, interactive maps within web pages. Google Maps is a dynamic overlay technique that has only become feasible relatively recently with the advent of widespread support for CSS positioning and Ajax technologies in many browsers. Overlay techniques enjoy a particular advantage over image generation techniques, in that they provide the potential for a more flexible GIS-like interaction with the map, with multiple layers that can be activated and deactivated as desired. This flexibility could explain why such techniques appear more prevalent in the literature. However, overlay techniques tend to rely on more recent web technologies such as CSS2 and Ajax, whereas image generation techniques generally do not. Image generation techniques should therefore be portable to a wider range of client and server environments. Each technique comprises a specific technology or collection of technologies (such as transparent bitmap overlays), implemented using a specific distribution style. For example, one image generation technique might be implemented completely server-side while another might use a mixture of server-side and client-side processing. Similarly, overlay techniques may adopt different distribution styles, and the overlays themselves might take the form of transparent images, absolutely positioned HTML elements, dynamically generated graphics, etc. Given the many possible techniques that were available, the next question was which techniques would be most suitable for our purposes? Scalability is a key issue for web applications in general \cite[p.\ 28]{Offu-J-2002-quality}, and online activity visualization in particular \cite[p.\ 50]{Eick-SG-2001-sitevis}, so we were particularly interested in techniques that could scale to a large number of points. For example, at the time of writing the Otago EPrints repository had been accessed from over 10,000 distinct IP addresses, each potentially representing a distinct geographical location. Separating out the type of hit (abstract view versus document download) increased that figure to nearly 13,000. We first narrowed down the range of techniques to just four (server-side image generation, server-side image overlay, server-side HTML overlay and Google Maps); the selection process and details of the techniques chosen are discussed in Section~\ref{sec-techniques}. We then set about testing the scalability of these four techniques, in order to determine how well each technique handled large numbers of points. A series of experiments was conducted on each technique with progressively larger data sets, and the elapsed time and memory usage were measured. The experimental design is discussed in Section~\ref{sec-experiment}. Our initial intuition was that server-side image generation and server-side image overlay techniques would scale best, and this was borne out by the results of the experiments, which show that both techniques scale reasonably well to very large numbers of points. The other two techniques proved to be reasonable for relatively small numbers of points (generally less than about 500--1,000), but their performance deteriorated rapidly beyond this. The results are discussed in more detail in Section~\ref{sec-results}. It should be noted that the intent of the experiments was not to identify statistically significant differences between techniques. It was expected that variations across techniques would be obvious, and the experiments were designed to test this expectation. However, the two best performing techniques, server-side image generation and server-side image overlay, produced very similar results, so a more formal statistical analysis of these techniques may be warranted. This and other possible future directions are discussed in Section~\ref{sec-future}. \section{Technique selection} \label{sec-techniques} In this section we discuss in more detail the four techniques that we chose for testing, and how we decided upon these particular techniques. First, we discuss the impact of distribution style on the choice of technique. Then, for each of the four chosen techniques, we examine how the technique works in practice, its implementation requirements, its relative advantages and disadvantages, and any other issues peculiar to the technique. \subsection{Distribution style} \label{sec-distribution} \citeN{Wood-J-1996-vis} and \citeN{MacE-AM-1998-GIS} identified four distribution styles for web-based geographic visualization software. The \emph{data server} style is where the server only supplies raw data, and all manipulation, display and analysis takes place at the client. In other words, this is primarily a client-side processing model, as illustrated in Figure~\ref{fig-distribution-styles}(a). For example, Palantir implemented an overlay technique using this distribution style \cite{Papa-N-1998-Palantir}, where the source data were generated at the server and the map was generated, displayed and manipulated by a Java applet running at the client. The data server distribution style can provide a very dynamic and interactive environment to the end user, but clearly requires support for executing application code within the web browser, typically using something like JavaScript, Java applets or Flash. JavaScript is now tightly integrated into most browsers, but the same cannot be said for either Java or Flash. That is, we cannot necessarily guarantee the existence of a Java virtual machine or Flash plugin in every browser, which violates our requirement to avoid manual installation of additional client-side software. We can therefore eliminate Java- or Flash-based data server techniques from consideration, but JavaScript-based data server techniques may still be feasible. \begin{figure} \begin{center} \begin{tabular}{ccc} \includegraphics[scale=1]{data_server} & \qquad & \includegraphics[scale=1]{image_server} \\ \footnotesize (a) Data server & \qquad & \footnotesize (b) Image server \\ \\ \\ \includegraphics[scale=1]{model_interaction} & \qquad & \includegraphics[scale=1]{shared} \\ \footnotesize (c) Model interaction environment & \qquad & \footnotesize (d) Shared environment \\ \end{tabular} \end{center} \caption{Distribution styles for web-based geographic visualization \protect\cite{Wood-J-1996-vis}. (F = filtering, M = mapping, R = rendering.)} \label{fig-distribution-styles} \end{figure} In contrast, the \emph{image server} style is where the display is created entirely at the server and is only viewed at the client. In other words, this is primarily a server-side processing model, as illustrated in Figure~\ref{fig-distribution-styles}(b). Consequently, techniques that use this style require no additional client-side software, and thus meet our requirements. The downside is that the resultant visualization can tend to be very static and non-interactive in nature, as it is just a simple bitmap image. The \emph{model interaction environment} style is where a model created at the server can be explored at the client, as illustrated in Figure~\ref{fig-distribution-styles}(c). \citeN{Wood-J-1996-vis} originally referred to this as the ``3D model interaction'' style, but this seems slightly out of place in the current context. They originally intended this distribution style to apply to VRML models for GIS applications, but it could be equally applied to any situation where an interactive model is generated at the server, then downloaded to and manipulated at the client. This is very similar to what happens with many Flash-based applications, for example. ``Model interaction environment'' therefore seems a more appropriate name for this style. The key distinguishing feature of this style is that there is no further interaction between the client and server after the model has been downloaded. This means that while the downloaded model can be very dynamic and interactive, changing the underlying data requires a new model to be generated at the server and downloaded to the client. Similar restrictions apply to techniques using this style as to the data server style, so Java- and Flash-based model interaction environment techniques can be eliminated from consideration. For similar reasons, we can also eliminate solutions that require browser plugins such as VRML or SVG (although native support for the latter is beginning to appear in some browsers). It may be possible to implement this distribution style using only client-side JavaScript, but it is presently unclear as to how effective this might be. % future work: implement model interaction using JavaScript? Finally, the \emph{shared environment} style is where data manipulation is done at the server, but control of that manipulation, rendering, and display all occur at the client, as illustrated in Figure~\ref{fig-distribution-styles}(d). This is similar to the model interaction environment style, but with the addition of a feedback loop from the client to the server, thus enabling a more flexible and dynamic interaction. This is essentially the distribution style provided by Ajax technologies [REF]. We can eliminate techniques based on the same criteria as applied to the other three styles. \subsection{Image generation techniques} \label{sec-image-gen} As noted earlier, image generation techniques work by directly plotting geolocated IP addresses onto a base map image, then displaying the composite image at the client. A typical example of the kind of output that might be produced is shown in Figure~\ref{fig-image}. Such techniques require two specific components: software to programmatically create and manipulate bitmap images (for example, the GD image library\footnote{\url{http://www.boutell.com/gd/}}); and software to transform raw latitude/longitude coordinates into projected map coordinates on the base map (for example, the PROJ.4 cartographic projections library\footnote{\url{http://www.remotesensing.org/proj/}}). \begin{figure} \begin{center} \includegraphics[width=0.95\textwidth,keepaspectratio]{ImageGeneration-full} \end{center} \caption{Sample output from the server-side image generation technique.} \label{fig-image} \end{figure} Image generation techniques could use any of the distribution styles discussed in Section~\ref{sec-distribution}. However, all but the image server style would require the installation of additional client-side software for generating images and performing cartographic projection operations, so we will only consider image generation using an image server distribution style (or ``server-side image generation'') from this point on. The server-side image generation technique provides some distinct advantages. It is relatively simple to implement and is fast at producing the final image, mainly because it uses existing, well-established technologies. It is also bandwidth efficient: the size of the generated map image is determined by the total number of pixels and the compression method used, rather than by the number of points to be plotted. The amount of data to be sent to the client should therefore remain more or less constant, regardless of the number of points plotted. This technique also has some disadvantages, however. First, a suitable base map image must be acquired. This could be generated from a GIS, but if this is not an option an appropriate image must be obtained from a third party. Care must be taken in the latter case to avoid potential copyright issues. Second, the compression method used to produce the final composite map image can have a significant impact on visual quality. For example, lossy compression methods such as JPEG can make the points plotted on the map appear distinctly fuzzy, as shown in Figure~\ref{fig-image-quality}. A lossless compression method such as PNG will avoid this problem, but will tend to produce larger image files. Finally, it is harder to provide interactive map manipulation features with this technique, as the output is a simple static image. Anything that changes the content of the map (such as panning or changing the visibility of points) will require the entire image to be regenerated. Zooming could be achieved if a very high resolution base map image was available, but the number of possible zoom levels might be restricted. \begin{figure} \begin{center} \includegraphics[scale=1.25]{jpeg_detail}\medskip \includegraphics[scale=1.25]{overlay_detail} \end{center} \caption{Image quality of JPEG (Q=90) image generation (top) vs.\ PNG image overlay (bottom).} \label{fig-image-quality} \end{figure} \subsection{Overlay techniques} \label{sec-overlay} % Look for publications regarding the DataCrossing Ajax client. % See <http://datacrossing.crs4.it/en_Documentation_Overlay_Example.html>. % They use <IMG> rather than <DIV>, which has the advantage of the image % being loaded only once, but makes it harder to dynamically change the % appearance of markers. The amount of data generated will still be % proportional to the number of points (one <IMG> per point). Overlay techniques also involve plotting points onto a base map image, but they differ from image generation techniques in that the points are not composited directly onto the base map image. Rather, the points are displayed as an independent overlay on top of the base map image. This provides a significant advantage over image generation techniques, as it enables the possibility of multiple independent overlays that can be individually shown or hidden. This is very similar to the multi-layer functionality provide by GIS, and is an effective way to provided interactive visualizations of geographic data \cite{Wood-J-1996-vis,MacE-AM-1998-GIS}. We still have the problem of finding a suitable base map image, however. Until relatively recently, implementing overlay techniques would likely have required additional software at the client, but most modern browsers now support absolute positioning of elements using CSS. This enables us to create a map overlay using nothing more than HTML, CSS and a few bitmap images. We have identified two main alternatives for producing such an overlay, which we have termed \emph{image overlay} and \emph{HTML overlay}. An image overlay comprises a transparent bitmap image into which the points are plotted, which is then overlaid on the base map image (in our implementation, the output looks essentially identical to that shown in Figure~\ref{fig-image}). This requires the overlay image to be in either PNG or GIF format, as JPEG does not support transparency. Fortunately the overlay image is likely to contain a lot of ``white space'', which compresses very well, so use of a lossless compression method should not be an issue. This also eliminates the ``fuzziness'' issue noted earlier (see Figure~\ref{fig-image-quality}). The size of the image overlay will generally be proportional to the number of points to be plotted, but the image compression should have a moderating effect on this. As noted earlier, generating images at the client would require additional software to be installed, so we will only consider the data server distribution style for image overlays. That is, both the base map image and the overlay(s) are generated at the server. An HTML overlay comprises a collection of HTML elements corresponding to the points to be plotted, which are positioned over the base map image using CSS absolute positioning. There is considerable flexibility as to the types of elements that could be used to construct the overlay. One possibility is to use \verb|<IMG>| elements to place icons on the base map; this appears to be the approach adopted by Google Maps (see Figure~\ref{fig-google}). Another possibility is to use appropriately sized and colored \verb|<DIV>| elements, which then appear as colored blocks ``floating'' over the base map image (in our implementation, the output looks essentially identical to that shown in Figure~\ref{fig-image}). \begin{figure} \begin{center} \includegraphics[width=0.95\textwidth,keepaspectratio]{GoogleMap-full.png} \end{center} \caption{Sample output from the Google Maps technique.} \label{fig-google} \end{figure} HTML overlays may be generated at either the server or the client. Unlike the techniques discussed previously, however, HTML overlays can be generated at the client without the need for additional software, because only HTML (i.e., text) is being generated, not images. This can be easily achieved using client-side JavaScript, so HTML overlays can use any of the distribution styles discussed in Section~\ref{sec-distribution} without violating our requirements. We have therefore adopted two representative overlay techniques for our experiments: server-side HTML overlays (using the image server distribution style) and Google Maps (using the data server distribution style). Since Google Maps uses \verb|<IMG>| elements, we have used \verb|<DIV>| elements for the server-side HTML overlay. Server-side HTML overlays are actually slightly simpler to implement than either server-side image generation or image overlays, because we do not need to write any code to generate or manipulate images (the base map image is static and thus requires no additional processing). All that is required is code to transform latitude/longitude coordinates into projected map coordinates and produce corresponding \verb|<DIV>| elements. Google Maps \cite{Goog-M-2006-maps} is a more complex proposition. This technique uses the data server distribution style, where JavaScript code running within the browser enables the client to manipulate the base map and its overlays. Data and map images are requested asynchronously from the server as required, using Ajax technologies, which seems to imply that Google Maps in fact uses the shared environment distribution style. However, the server has no involvement beyond simply supplying data to the client. In the shared environment distribution style, the server is directly involved in manipulating the map, under the control of the client. This is clearly not the case with Google Maps. The primary advantage of Google Maps is the powerful functionality it provides for generating and interacting with the map. Users may pan the map in any direction and zoom in and out to many different levels. A satellite imagery view is also available. In addition, further information about each point plotted (such as the name of the city, for example) can be displayed in a callout attached to the point, as shown in Figure~\ref{fig-google}. However, there are also some significant disadvantages to the Google Maps technique\footnote{Interestingly, the Google Earth application addresses many of these issues, but since it is not a browser-based solution it falls outside the scope of our consideration. However, for interest's sake we did an informal comparison between Google Earth and the four techniques that we have tested, and this has been included in the results in Section~\ref{sec-results}.}. First, it is a distributed application, thus making it more complex to implement, test and debug [REF]. Second, the server must have a registered API key from Google, which is verified every time that a page attempts to use the API. Similarly, the client must connect to Google's servers in order to to download the API's JavaScript source. This means that the technique must have an active Internet connection in order to work. Finally, the Google Maps API does not currently provide any way to toggle the visibility of markers on the map, so it is not possible to implement the interactive ``layers'' mentioned at the start of this section. (It is possible, of course, that Google will implement this feature in a later version of the API.) The most significant disadvantage of all HTML overlay techniques, however, is that the size of the HTML overlay is directly proportional to the number of points to be plotted. There will be one overlay element (\verb|<DIV>| or \verb|<IMG>|) per point, so a very large number of points will result in an even larger amount of HTML source being generated. We expect that this will lead to excessive browser memory usage, and consequently that these techniques will not scale well at the high end. However, they may still be useful for smaller data sets that require interactive manipulation. \section{Experimental design} \label{sec-experiment} After some preliminary testing with live data from the Otago School of Business repository, we proceeded with a series of experiments to test the scalability of the four techniques. Each technique was tested using progressively larger synthetic data sets. The first data set comprised one point at the South Pole (latitude \(-90^{\circ}\), longitude \(-180^{\circ}\)). Each successive data set was twice the size of its predecessor, building up a regular grid of latitude/longitude points at one degree intervals\footnote{The entire grid has 64,800 points, so the five largest data sets have many duplicate points.}. A total of twenty-one data sets were created in this way, with the number of points ranging from one to 1,048,576 (\(=2^{20}\)). The result of plotting the 16,384-point data set is shown in Figure~\ref{fig-grid-points}. \begin{figure} \begin{center} \includegraphics[width=0.95\textwidth,keepaspectratio]{ImageGeneration-full} \end{center} \caption{The 16,384-point data set plotted on the base map.} \label{fig-grid-points} \end{figure} The focus on scalability meant that we were primarily interested in measuring page load times, memory usage and the amount of data generated (which impacts on both storage and network bandwidth). Page load time can be further broken down into the time taken to generate the map data, the time taken to transfer the map data to the client across the network, and the time taken by the client to display the map. Unfortunately, the Google Maps technique requires an active Internet connection (as noted in Section~\ref{sec-overlay}), so we were unable to run the experiments on an isolated network. This meant that traffic on the local network was a potential confounding factor. We therefore decided to eliminate network performance from the equation by running both the server and the client on the same machine\footnote{A Power Macintosh G5 1.8\,GHz with 1\,GB RAM, running Mac OS X 10.4.7, Apache 2.0.55, PHP 4.4 and Perl 5.8.6.}. This in turn enabled us to measure the time taken for data generation and page display independently, thus simplifying the process of data collection and also ensuring that the client and server processes did not unduly interfere with each other, despite running on the same machine. It could be argued that network performance would still have a confounding effect on the Google Maps technique, but this would only be likely for the initial download of the API (comprising about 235\,kB of JavaScript source and images), which would be locally cached thereafter. The API key verification does occur every time the map is loaded, but the amount of data involved is very small, so it is less likely that this would be significantly affected by network performance. Any such effect would also be immediately obvious as it would simply block the server from proceeding. For each data set generated, we recorded its size, the time taken to generate it, the time taken to display the resultant map in the browser, and the amount of memory used during the test by the browser. We also intended to measure the memory usage of the server, but this proved more difficult to isolate than we expected, and was thus dropped from the experiments. The data set generation time and browser memory usage were measured using the \texttt{time} and \texttt{top} utilities respectively (the latter was run after each test run to avoid interference). The map display time was measured using the ``page load test'' debugging feature of Apple's Safari web browser, which can repetitively load a set of pages while recording various statistics, in particular the time taken to load the page. Tests were run up to twenty times each where feasible, in order to reduce the impact of random variations. Some tests were run fewer times because they took a very long time (several minutes for a single test run). We typically broke off further testing when a single test run took longer than about five minutes, as by this stage performance had already deteriorated well beyond usable levels. While it is really beyond the scope of this work, out of interest some informal tests were also undertaken using the Google Earth application. A Perl script was used to generate a collection of KML files corresponding to the data sets described above. Each data set was then loaded into Google Earth, and a stopwatch was used to measure how long it took to load the data set, defined as the period during which the dialog box ``\textsf{Loading myplaces.kml, including enabled overlays}'' was displayed on screen. \subsection{Technique implementation} As noted in Sections~\ref{sec-image-gen} and \ref{sec-overlay}, the server-side image generation, server-side image overlay and server-side HTML overlay techniques were all implemented using the image server distribution style. A separate dispatcher page was written in PHP for each technique, which enabled arguments---such as the number of points to be plotted---to be passed from the client to a corresponding Perl script for each technique. The final page was then constructed as follows: \begin{description} \item[server-side image generation] The dispatcher page included a standard \verb|<IMG>| element that called the Perl script. This script then loaded a base map PNG image, plotted points directly onto it, and returned the composite map to the client as a JPEG image (with the ``quality'' parameter set to 90). \item[server-side image overlay] The dispatcher page included two \verb|<IMG>| elements, the first for the base map and the second for the overlay, both with identical CSS positioning attributes. The first \verb|<IMG>| simply loaded a static JPEG image representing the base map. The second \verb|<IMG>| called the Perl script, which generated and returned the overlay as a transparent PNG image. \item[server-side HTML overlay] The dispatcher page included a \verb|<IMG>| element for the base map and a \verb|<DIV>| element for the overlay, both with identical CSS positioning attributes. As with the previous technique, the \verb|<IMG>| simply loaded a static JPEG image representing the base map. The \verb|<DIV>| contained inline PHP code that called the Perl script. This in turn generated and returned the overlay as a collection of CSS-positioned \verb|<DIV>| elements, nested within the top-level \verb|<DIV>| element. \end{description} For all three techniques, the base map image was 1,024 by 520 pixels. In PNG format it occupied approximately 1.2\,MB (but this version was never returned to the client), while in JPEG format (Q=90) it occupied approximately 180\,kB. The base map image was derived from an original 3,599 by 1,826 pixel image, which was part of a collection of maps released into the public domain by the \citeN{CIA-WFB-2006}. All three techniques used the PROJ.4 cartographic projections library to convert latitude/longitude pairs into projected map coordinates, while the first two techniques also used the GD graphics library to programmatically generate and manipulate images. The Google Maps technique was implemented using the data server distribution style. Once again, a PHP dispatcher page was used. This time, however, the page included client-side JavaScript code to load and initialise the Google Maps API, create the base map, and build the map overlay. The first two steps were achieved using standard Google Maps API calls. For the last step, the client used an \texttt{XMLHttpRequest} object to call a server-side Perl script. This script generated and returned to the client an XML data set containing the points to be plotted. The client then looped through this data set and used the Google Maps API calls to create a marker on the base map corresponding to each point. \section{Results} \label{sec-results} As noted in the introduction, the intent of these experiments was not to do a full analysis and statistical comparison of the performance of the different techniques, but rather to identify broad trends. We have not, therefore, carried out any statistical analysis on the results. We will now discuss the results for data size, page load time and memory usage. Because the data set size increases by powers of two, we have used log-log scales for all plots. \subsection{Data size} The size of the data generated for each technique is shown in Figure~\ref{fig-data-size}. \begin{figure} \begin{center} \includegraphics[scale=0.66]{data_size} \end{center} \caption{Comparison of generated data size for each technique (log-log scale).} \label{fig-data-size} \end{figure} \subsection{Page load time} The size of the data generated for each technique is shown in Figure~\ref{fig-data-size}. \begin{figure} \begin{center} \includegraphics[scale=0.66]{data_generation_time} \end{center} \caption{Comparison of data generation time for each technique (log-log scale).} \label{fig-data-generation-time} \end{figure} \begin{figure} \begin{center} \includegraphics[scale=0.66]{page_load_time} \end{center} \caption{Comparison of map display time for each technique (log-log scale).} \label{fig-page-load-time} \end{figure} \begin{figure} \begin{center} \includegraphics[scale=0.66]{combined_time} \end{center} \caption{Comparison of combined page load time for each technique (log-log scale).} \label{fig-combined-time} \end{figure} \subsection{Memory usage} We measured both the real and virtual memory usage of the browser by running the \texttt{top} utility after each test run and observing the memory usage in each category. This told us the size of both the current ``working set'' and the total memory footprint of the browser process after it had completed a test run. % maybe put them in anyway?? The real memory data proved somewhat unreliable, however. Real memory usage was generally consistent across several test runs, but would also frequently fluctuate upwards by a factor of nearly two for no readily apparent reason. We can only assume that this was a result of other processes running on the test machine interacting with the browser process in unexpected ways. We therefore disregarded the real memory data. They did however, display similar trends to the virtual memory data. The virtual memory data proved more consistent, as the virtual memory footprint of a process is less likely to be impacted by other running processes. The amount of virtual memory used by the browser for each technique is shown in Figure~\ref{fig-virtual-memory}. \begin{figure} \begin{center} \includegraphics[scale=0.66]{real_memory} \end{center} \caption{Comparison of real memory usage for each technique (log-log scale).} \label{fig-virtual-memory} \end{figure} \begin{figure} \begin{center} \includegraphics[scale=0.66]{virtual_memory} \end{center} \caption{Comparison of virtual memory usage for each technique (log-log scale).} \label{fig-virtual-memory} \end{figure} \section{Conclusion} % The % software extracts IP addresses from the web server logs, geolocates them % using the free MaxMind GeoLite Country database\footnote{See % \url{http://www.maxmind.com/app/ip-location}.}, then stores the % resulting country information in a separate database. % The Tasmania software, however, uses countries as its base unit of % aggregation. We were interested in looking at the distribution on a finer % level, down to individual cities if possible \bibliography{Map_Visualisation} \begin{received} ... \end{received} \end{document}
\documentclass[acmtocl,acmnow]{acmtrans2m} \usepackage{graphicx} \newtheorem{theorem}{Theorem}[section] \newtheorem{conjecture}[theorem]{Conjecture} \newtheorem{corollary}[theorem]{Corollary} \newtheorem{proposition}[theorem]{Proposition} \newtheorem{lemma}[theorem]{Lemma} \newdef{definition}[theorem]{Definition} \newdef{remark}[theorem]{Remark} \markboth{Nigel Stanger}{...} \title{Scalability of Techniques for Online Geovisualization of Web Site Hits} \author{NIGEL STANGER \\ University of Otago} \begin{abstract} A useful approach to visualising the geographical distribution of web site hits is to geolocate the IP addresses of hits and plot them on a world map. This can be achieved by dynamic generation and display of map images at the server and/or the client. In this paper we compare the scalability with respect to source data size of four techniques for dynamic map generation and display: generating a single composite map image, overlaying transparent images on an underlying base map, overlaying CSS-enabled HTML on an underlying base map and generating a map using Google Maps. These four techniques embody a mixture of different display technologies and distribution styles. The results show that all four techniques are suitable for small data sets, but that the latter two techniques scale poorly to larger data sets. \end{abstract} \category{C.4}{Performance of Systems}{Performance attributes} \category{C.2.4}{Computer-Communication Networks}{Distributed Systems}[distributed applications] \category{H.3.5}{Information Storage and Retrieval}{Online Information Services}[web-based services] \terms{Experimentation, Measurement, Performance} \keywords{geolocation, geovisualization, scalability, GD, Google Maps} \begin{document} \bibliographystyle{acmtrans} \begin{bottomstuff} Author's address: N. Stanger, Department of Information Science, University of Otago, PO Box 56, Dunedin 9054, New Zealand. \end{bottomstuff} \maketitle \section{Introduction} \label{sec-introduction} When administering a web site, it is quite reasonable to want information on the nature of traffic to the site. Information on the geographic sources of traffic can be particularly useful in the right context. For example, an e-commerce site might wish to determine the geographical distribution of visitors to its site, so that it can decide where best to target its marketing resources. One approach to doing so is to plot the geographical location of web site hits on a map. Geographical information systems (GIS) were already being used for these kinds of purposes prior to the advent of the World Wide Web \cite{Beau-JR-1991-GIS}, and it is a natural extension to apply these ideas to online visualization of web site hits. Our interest in this area derives from implementing a pilot digital institutional repository at the University of Otago\footnote{\url{http://eprints.otago.ac.nz/}} in November 2005 \cite{Stan-N-2006-running}, using the GNU EPrints\footnote{\url{http://www.eprints.org/}} repository management software. This repository quickly attracted interest from around the world and the number of abstract views and document downloads began to steadily increase. We were obviously very interested in tracking this increase, particularly with respect to where in the world the hits were coming from. The EPrints statistics management software developed at the University of Tasmania \cite{Sale-A-2006-stats} proved very useful in this regard, providing us with detailed per-eprint and per-country download statistics; an example of the latter is shown in Figure~\ref{fig-tas-stats}. However, while this display provides an ordered ranking of the number of hits from each country, it does not provide any greater detail than to the country level, nor does it provide any visual clues as to the distribution of hit sources around the globe. \begin{figure} \begin{center} \includegraphics[scale=0.65]{tasmania_stats} \end{center} \caption{A portion of the by-country display for the Otago EPrints repository, generated by the Tasmania statistics software.} \label{fig-tas-stats} \end{figure} We therefore began to explore possible techniques for plotting our repository hit data onto a world map, with the aim of adding this capability to the Tasmania statistics package. Our preference was for a technique that could be used within a modern web browser without the need to manually install additional client software, so as to make the new feature available to the widest possible audience and reduce the impact of wide variation in client hardware and software environments \cite[pp.\ 27--28]{Offu-J-2002-quality}. There have been several prior efforts to geovisualize web activity. \citeN{Lamm-SE-1996-webvis} developed a sophisticated system for real-time visualization of web traffic on a 3D globe, but this was intended for use within a virtual reality environment, thus limiting its general applicability. \citeN{Papa-N-1998-Palantir} described a similar system (Palantir), which was written as a Java applet and thus able to be run within a web browser, assuming that a Java virtual machine was available. \citeN[pp.\ 100--103]{Dodg-M-2001-cybermap} describe these and several other related systems for mapping Web and Internet traffic. These early systems suffered from a distinct limitation in that there was no public infrastructure in place for geolocating IP addresses (that is, translating them into latitude/longitude coordinates). They generally used \texttt{whois} lookups or parsed the domain name in an attempt to guess the country of origin, with fairly crude results \cite{Lamm-SE-1996-webvis}. Locations outside the United States were typically aggregated by country and mapped to the capital city \cite{Lamm-SE-1996-webvis,Papa-N-1998-Palantir,Jian-B-2000-cybermap}. Reasonably accurate and detailed databases were commercially available at the time \cite[p.\ 1466]{Lamm-SE-1996-webvis}, but were not generally available to the public at large, thus limiting their utility. The situation has improved considerably in the last five years, however, with the advent of freely available and reasonably accurate geolocation services\footnote{Such as \url{http://www.maxmind.com/} or \url{http://www.ip2location.com/}.} with worldwide coverage and city-level resolution. For example, Maxmind's \emph{GeoLite City} database is freely available and claims to provide ``60\% accuracy on a city level for the US within a 25 mile radius'' \cite{Maxm-G-2006-GeoLiteCity}. Their commercial \emph{GeoIP City} database claims 80\% accuracy for the same parameters. The techniques used by these systems can generally be divided into two classes. The first class of techniques generate a single bitmap image that contains both the map and the icons representing web hits. This can be achieved by programmatically plotting points onto a base map image; the composite image is displayed at the client. We shall henceforth refer to this class of techniques as \emph{image generation} techniques. The second class of techniques separately return both a base map image and some kind of overlay containing the plotted points. The overlay is then combined with the base map at the client. We shall henceforth refer to this class of techniques as \emph{overlay} techniques. Both classes of techniques have been used in the aforementioned systems, but the overlay technique appears to have been particularly popular. For example, Palantir used an overlay technique, where a Java applet running at the client overlaid graphic elements onto a base map image retrieved from the now-defunct Xerox online map server \cite{Papa-N-1998-Palantir}. A more recent example is the Google Maps API \cite{Goog-M-2006-maps}, which enables web developers to easily embed dynamic, interactive maps within web pages. Google Maps is a dynamic overlay technique that has only become feasible relatively recently with the advent of widespread support for CSS positioning and Ajax technologies in many browsers. Overlay techniques enjoy a particular advantage over image generation techniques, in that they provide the potential for a more flexible GIS-like interaction with the map, with multiple layers that can be activated and deactivated as desired. This flexibility could explain why such techniques appear more prevalent in the literature. However, overlay techniques tend to rely on more recent web technologies such as CSS2 and Ajax, whereas image generation techniques generally do not. Image generation techniques should therefore be portable to a wider range of client and server environments. Each technique comprises a specific technology or collection of technologies (such as transparent bitmap overlays), implemented using a specific distribution style. For example, one image generation technique might be implemented completely server-side while another might use a mixture of server-side and client-side processing. Similarly, overlay techniques may adopt different distribution styles, and the overlays themselves might take the form of transparent images, absolutely positioned HTML elements, dynamically generated graphics, etc. Given the many possible techniques that were available, the next question was which techniques would be most suitable for our purposes? Scalability is a key issue for web applications in general \cite[p.\ 28]{Offu-J-2002-quality}, and online activity visualization in particular \cite[p.\ 50]{Eick-SG-2001-sitevis}, so we were particularly interested in techniques that could scale to a large number of points. For example, at the time of writing the Otago EPrints repository had been accessed from over 10,000 distinct IP addresses, each potentially representing a distinct geographical location. Separating out the type of hit (abstract view versus document download) increased that figure to nearly 13,000. We first narrowed down the range of techniques to just four (server-side image generation, server-side image overlay, server-side HTML overlay and Google Maps); the selection process and details of the techniques chosen are discussed in Section~\ref{sec-techniques}. We then set about testing the scalability of these four techniques, in order to determine how well each technique handled large numbers of points. A series of experiments was conducted on each technique with progressively larger data sets, and the elapsed time and memory usage were measured. The experimental design is discussed in Section~\ref{sec-experiment}. Our initial intuition was that server-side image generation and server-side image overlay techniques would scale best, and this was borne out by the results of the experiments, which show that both techniques scale reasonably well to very large numbers of points. The other two techniques proved to be reasonable for relatively small numbers of points (generally less than about 500--1,000), but their performance deteriorated rapidly beyond this. The results are discussed in more detail in Section~\ref{sec-results}. It should be noted that the intent of the experiments was not to identify statistically significant differences between techniques. It was expected that variations across techniques would be obvious, and the experiments were designed to test this expectation. However, the two best performing techniques, server-side image generation and server-side image overlay, produced very similar results, so a more formal statistical analysis of these techniques may be warranted. This and other possible future directions are discussed in Section~\ref{sec-future}. \section{Technique selection} \label{sec-techniques} In this section we discuss in more detail the four techniques that we chose for testing, and how we decided upon these particular techniques. First, we discuss the impact of distribution style on the choice of technique. Then, for each of the four chosen techniques, we examine how the technique works in practice, its implementation requirements, its relative advantages and disadvantages, and any other issues peculiar to the technique. \subsection{Distribution style} \label{sec-distribution} \citeN{Wood-J-1996-vis} and \citeN{MacE-AM-1998-GIS} identified four distribution styles for web-based geographic visualization software. The \emph{data server} style is where the server only supplies raw data, and all manipulation, display and analysis takes place at the client. In other words, this is primarily a client-side processing model, as illustrated in Figure~\ref{fig-distribution-styles}(a). For example, Palantir implemented an overlay technique using this distribution style \cite{Papa-N-1998-Palantir}, where the source data were generated at the server and the map was generated, displayed and manipulated by a Java applet running at the client. The data server distribution style can provide a very dynamic and interactive environment to the end user, but clearly requires support for executing application code within the web browser, typically using something like JavaScript, Java applets or Flash. JavaScript is now tightly integrated into most browsers, but the same cannot be said for either Java or Flash. That is, we cannot necessarily guarantee the existence of a Java virtual machine or Flash plugin in every browser, which violates our requirement to avoid manual installation of additional client-side software. We can therefore eliminate Java- or Flash-based data server techniques from consideration, but JavaScript-based data server techniques may still be feasible. \begin{figure} \begin{center} \begin{tabular}{ccc} \includegraphics[scale=1]{data_server} & \qquad & \includegraphics[scale=1]{image_server} \\ \footnotesize (a) Data server & \qquad & \footnotesize (b) Image server \\ \\ \\ \includegraphics[scale=1]{model_interaction} & \qquad & \includegraphics[scale=1]{shared} \\ \footnotesize (c) Model interaction environment & \qquad & \footnotesize (d) Shared environment \\ \end{tabular} \end{center} \caption{Distribution styles for web-based geographic visualization \protect\cite{Wood-J-1996-vis}. (F = filtering, M = mapping, R = rendering.)} \label{fig-distribution-styles} \end{figure} In contrast, the \emph{image server} style is where the display is created entirely at the server and is only viewed at the client. In other words, this is primarily a server-side processing model, as illustrated in Figure~\ref{fig-distribution-styles}(b). Consequently, techniques that use this style require no additional client-side software, and thus meet our requirements. The downside is that the resultant visualization can tend to be very static and non-interactive in nature, as it is just a simple bitmap image. The \emph{model interaction environment} style is where a model created at the server can be explored at the client, as illustrated in Figure~\ref{fig-distribution-styles}(c). \citeN{Wood-J-1996-vis} originally referred to this as the ``3D model interaction'' style, but this seems slightly out of place in the current context. They originally intended this distribution style to apply to VRML models for GIS applications, but it could be equally applied to any situation where an interactive model is generated at the server, then downloaded to and manipulated at the client. This is very similar to what happens with many Flash-based applications, for example. ``Model interaction environment'' therefore seems a more appropriate name for this style. The key distinguishing feature of this style is that there is no further interaction between the client and server after the model has been downloaded. This means that while the downloaded model can be very dynamic and interactive, changing the underlying data requires a new model to be generated at the server and downloaded to the client. Similar restrictions apply to techniques using this style as to the data server style, so Java- and Flash-based model interaction environment techniques can be eliminated from consideration. For similar reasons, we can also eliminate solutions that require browser plugins such as VRML or SVG (although native support for the latter is beginning to appear in some browsers). It may be possible to implement this distribution style using only client-side JavaScript, but it is presently unclear as to how effective this might be. % future work: implement model interaction using JavaScript? Finally, the \emph{shared environment} style is where data manipulation is done at the server, but control of that manipulation, rendering, and display all occur at the client, as illustrated in Figure~\ref{fig-distribution-styles}(d). This is similar to the model interaction environment style, but with the addition of a feedback loop from the client to the server, thus enabling a more flexible and dynamic interaction. This is essentially the distribution style provided by Ajax technologies [REF]. We can eliminate techniques based on the same criteria as applied to the other three styles. \subsection{Image generation techniques} \label{sec-image-gen} As noted earlier, image generation techniques work by directly plotting geolocated IP addresses onto a base map image, then displaying the composite image at the client. A typical example of the kind of output that might be produced is shown in Figure~\ref{fig-image}. Such techniques require two specific components: software to programmatically create and manipulate bitmap images (for example, the GD image library\footnote{\url{http://www.boutell.com/gd/}}); and software to transform raw latitude/longitude coordinates into projected map coordinates on the base map (for example, the PROJ.4 cartographic projections library\footnote{\url{http://www.remotesensing.org/proj/}}). \begin{figure} \begin{center} \includegraphics[width=0.95\textwidth,keepaspectratio]{ImageGeneration-full} \end{center} \caption{Sample output from the server-side image generation technique.} \label{fig-image} \end{figure} Image generation techniques could use any of the distribution styles discussed in Section~\ref{sec-distribution}. However, all but the image server style would require the installation of additional client-side software for generating images and performing cartographic projection operations, so we will only consider image generation using an image server distribution style (or ``server-side image generation'') from this point on. The server-side image generation technique provides some distinct advantages. It is relatively simple to implement and is fast at producing the final image, mainly because it uses existing, well-established technologies. It is also bandwidth efficient: the size of the generated map image is determined by the total number of pixels and the compression method used, rather than by the number of points to be plotted. The amount of data to be sent to the client should therefore remain more or less constant, regardless of the number of points plotted. This technique also has some disadvantages, however. First, a suitable base map image must be acquired. This could be generated from a GIS, but if this is not an option an appropriate image must be obtained from a third party. Care must be taken in the latter case to avoid potential copyright issues. Second, the compression method used to produce the final composite map image can have a significant impact on visual quality. For example, lossy compression methods such as JPEG can make the points plotted on the map appear distinctly fuzzy, as shown in Figure~\ref{fig-image-quality}. A lossless compression method such as PNG will avoid this problem, but will tend to produce larger image files. Finally, it is harder to provide interactive map manipulation features with this technique, as the output is a simple static image. Anything that changes the content of the map (such as panning or changing the visibility of points) will require the entire image to be regenerated. Zooming could be achieved if a very high resolution base map image was available, but the number of possible zoom levels might be restricted. \begin{figure} \begin{center} \includegraphics[scale=1.25]{jpeg_detail}\medskip \includegraphics[scale=1.25]{overlay_detail} \end{center} \caption{Image quality of JPEG image generation (top) vs.\ PNG image overlay (bottom).} \label{fig-image-quality} \end{figure} \subsection{Overlay techniques} \label{sec-overlay} % Look for publications regarding the DataCrossing Ajax client. % See <http://datacrossing.crs4.it/en_Documentation_Overlay_Example.html>. % They use <IMG> rather than <DIV>, which has the advantage of the image % being loaded only once, but makes it harder to dynamically change the % appearance of markers. The amount of data generated will still be % proportional to the number of points (one <IMG> per point). Overlay techniques also involve plotting points onto a base map image, but they differ from image generation techniques in that the points are not composited directly onto the base map image. Rather, the points are displayed as an independent overlay on top of the base map image. This provides a significant advantage over image generation techniques, as it enables the possibility of multiple independent overlays that can be individually shown or hidden. This is very similar to the multi-layer functionality provide by GIS, and is an effective way to provided interactive visualizations of geographic data \cite{Wood-J-1996-vis,MacE-AM-1998-GIS}. We still have the problem of finding a suitable base map image, however. Until relatively recently, implementing overlay techniques would likely have required additional software at the client, but most modern browsers now support absolute positioning of elements using CSS. This enables us to create a map overlay using nothing more than HTML, CSS and a few bitmap images. We have identified two main alternatives for producing such an overlay, which we have termed \emph{image overlay} and \emph{HTML overlay}. An image overlay comprises a transparent bitmap image into which the points are plotted, which is then overlaid on the base map image (in our implementation, the output looks essentially identical to that shown in Figure~\ref{fig-image}). This requires the overlay image to be in either PNG or GIF format, as JPEG does not support transparency. Fortunately the overlay image is likely to contain a lot of ``white space'', which compresses very well, so use of a lossless compression method should not be an issue. This also eliminates the ``fuzziness'' issue noted earlier (see Figure~\ref{fig-image-quality}). The size of the image overlay will generally be proportional to the number of points to be plotted, but the image compression should have a moderating effect on this. As noted earlier, generating images at the client would require additional software to be installed, so we will only consider the data server distribution style for image overlays. That is, both the base map image and the overlay(s) are generated at the server. An HTML overlay comprises a collection of HTML elements corresponding to the points to be plotted, which are positioned over the base map image using CSS absolute positioning. There is considerable flexibility as to the types of elements that could be used to construct the overlay. One possibility is to use \verb|<IMG>| elements to place icons on the base map; this appears to be the approach adopted by Google Maps (see Figure~\ref{fig-google}). Another possibility is to use appropriately sized and colored \verb|<DIV>| elements, which then appear as colored blocks ``floating'' over the base map image (in our implementation, the output looks essentially identical to that shown in Figure~\ref{fig-image}). \begin{figure} \begin{center} \includegraphics[width=0.95\textwidth,keepaspectratio]{GoogleMap-full.png} \end{center} \caption{Sample output from the Google Maps technique.} \label{fig-google} \end{figure} HTML overlays may be generated at either the server or the client. Unlike the techniques discussed previously, however, HTML overlays can be generated at the client without the need for additional software, because only HTML (i.e., text) is being generated, not images. This can be easily achieved using client-side JavaScript, so HTML overlays can use any of the distribution styles discussed in Section~\ref{sec-distribution} without violating our requirements. We have therefore adopted two representative overlay techniques for our experiments: server-side HTML overlays (using the image server distribution style) and Google Maps (using the data server distribution style). Since Google Maps uses \verb|<IMG>| elements, we have used \verb|<DIV>| elements for the server-side HTML overlay. Server-side HTML overlays are actually slightly simpler to implement than either server-side image generation or image overlays, because we do not need to write any code to generate or manipulate images (the base map image is static and thus requires no additional processing). All that is required is code to transform latitude/longitude coordinates into projected map coordinates and produce corresponding \verb|<DIV>| elements. Google Maps \cite{Goog-M-2006-maps} is a more complex proposition. This technique uses the data server distribution style, where JavaScript code running within the browser enables the client to manipulate the base map and its overlays. Data and map images are requested asynchronously from the server as required, using Ajax technologies, which seems to imply that Google Maps in fact uses the shared environment distribution style. However, the server has no involvement beyond simply supplying data to the client. In the shared environment distribution style, the server is directly involved in manipulating the map, under the control of the client. This is clearly not the case with Google Maps. The primary advantage of Google Maps is the powerful functionality it provides for generating and interacting with the map. Users may pan the map in any direction and zoom in and out to many different levels. A satellite imagery view is also available. In addition, further information about each point plotted (such as the name of the city, for example) can be displayed in a callout attached to the point, as shown in Figure~\ref{fig-google}. However, there are also some significant disadvantages to the Google Maps technique\footnote{Interestingly, the Google Earth application addresses many of these issues, but since it is not a browser-based solution it falls outside the scope of our consideration. However, for interest's sake we did an informal comparison between Google Earth and the four techniques that we have tested, and this has been included in the results in Section~\ref{sec-results}.}. First, it is a distributed application, thus making it more complex to implement, test and debug [REF]. Second, the server must have a registered API key from Google, which is verified every time that a page attempts to use the API. Similarly, the client must connect to Google's servers in order to to download the API's JavaScript source. This means that the technique must have an active Internet connection in order to work. Finally, the Google Maps API does not currently provide any way to toggle the visibility of markers on the map, so it is not possible to implement the interactive ``layers'' mentioned at the start of this section. (It is possible, of course, that Google will implement this feature in a later version of the API.) The most significant disadvantage of all HTML overlay techniques, however, is that the size of the HTML overlay is directly proportional to the number of points to be plotted. There will be one overlay element (\verb|<DIV>| or \verb|<IMG>|) per point, so a very large number of points will result in an even larger amount of HTML source being generated. We expect that this will lead to excessive browser memory usage, and consequently that these techniques will not scale well at the high end. However, they may still be useful for smaller data sets that require interactive manipulation. \section{Experimental design} \label{sec-experiment} After some preliminary testing with live data from the Otago School of Business repository, we proceeded with a series of experiments to test the scalability of the four techniques. Each technique was tested using progressively larger synthetic data sets. The first data set comprised one point at the South Pole (latitude \(-90^{\circ}\), longitude \(-180^{\circ}\)). Each successive data set was twice the size of its predecessor, building up a regular grid of latitude/longitude points at one degree intervals\footnote{The entire grid has 64,800 points, so the five largest data sets have many duplicate points.}. A total of twenty-one data sets were created in this way, with the number of points ranging from one to 1,048,576 (\(=2^{20}\)). The result of plotting the 16,384-point data set is shown in Figure~\ref{fig-grid-points}. \begin{figure} \begin{center} \includegraphics[width=0.95\textwidth,keepaspectratio]{ImageGeneration-full} \end{center} \caption{The 16,384-point data set plotted on the base map.} \label{fig-grid-points} \end{figure} The focus on scalability meant that we were primarily interested in measuring page load times, memory usage and the amount of data generated (which impacts on both storage and network bandwidth). Page load time can be further broken down into the time taken to generate the map data, the time taken to transfer the map data to the client across the network, and the time taken by the client to display the map. Unfortunately, the Google Maps technique requires an active Internet connection (as noted in Section~\ref{sec-overlay}), so we were unable to run the experiments on an isolated network. This meant that traffic on the local network was a potential confounding factor. We therefore decided to eliminate network performance from the equation by running both the server and the client on the same machine\footnote{A Power Macintosh G5 1.8\,GHz with 1\,GB RAM, running Mac OS X 10.4.7, Apache 2.0.55, PHP 4.4 and Perl 5.8.6.}. This in turn enabled us to measure the time taken for data generation and page display independently, thus simplifying the process of data collection and also ensuring that the client and server processes did not unduly interfere with each other, despite running on the same machine. It could be argued that network performance would still have a confounding effect on the Google Maps technique, but this would only be likely for the initial download of the API (comprising about 235\,kB of JavaScript source and images), which would be locally cached thereafter. The API key verification does occur every time the map is loaded, but the amount of data involved is very small, so it is less likely that this would be significantly affected by network performance. Any such effect would also be immediately obvious as it would simply block the server from proceeding. For each data set generated, we recorded its size, the time taken to generate it, the time taken to display the resultant map in the browser, and the amount of memory used during the test by the browser. We also intended to measure the memory usage of the server, but this proved more difficult to isolate than we expected, and was thus dropped from the experiments. The data set generation time and browser memory usage were measured using the \texttt{time} and \texttt{top} utilities respectively (the latter was run after each test run to avoid interference). The map display time was measured using the ``page load test'' debugging feature of Apple's Safari web browser, which can repetitively load a set of pages while recording various statistics, in particular the time taken to load the page. Tests were run up to twenty times each where feasible, in order to reduce the impact of random variations. Some tests were run fewer times because they took a very long time (several minutes for a single test run). We typically broke off further testing when a single test run took longer than about five minutes, as by this stage performance had already deteriorated well beyond usable levels. While it is really beyond the scope of this work, out of interest some informal tests were also undertaken using the Google Earth application. A Perl script was used to generate a collection of KML files corresponding to the data sets described above. Each data set was then loaded into Google Earth, and a stopwatch was used to measure how long it took to load the data set, defined as the period during which the dialog box ``\textsf{Loading myplaces.kml, including enabled overlays}'' was displayed on screen. \subsection{Technique implementation} As noted in Sections~\ref{sec-image-gen} and \ref{sec-overlay}, the server-side image generation, server-side image overlay and server-side HTML overlay techniques were all implemented using the image server distribution style. A separate dispatcher page was written in PHP for each technique, which enabled arguments---such as the number of points to be plotted---to be passed from the client to a corresponding Perl script for each technique. The final page was then constructed as follows: \begin{description} \item[server-side image generation] The Perl script generated a JPEG composite map image that was displayed using a standard \verb|<IMG>| element. \item[server-side image overlay] The base map (a JPEG image) was displayed using a CSS-positioned \verb|<IMG>| element. The Perl script then generated a transparent PNG image, which was displayed using an \verb|<IMG>| element with identical positioning attributes to the base map. \item[server-side HTML overlay] The base map (a JPEG image) was displayed using a CSS-positioned \verb|<IMG>| element. The Perl script then generated a collection of CSS-positioned \verb|<DIV>| elements, which were included inline in the source of the dispatcher page. \end{description} The Google Maps technique was implemented using the data server distribution style. Once again, a PHP dispatcher page was used. This time, however, the page included client-side JavaScript code to load and initialise the Google Maps API, create the base map, and build the map overlay. The first two steps were achieved using standard Google Maps API calls. For the last step, the client used an \texttt{XMLHttpRequest} object to call a server-side Perl script. This script generated and returned to the client an XML data set containing the points to be plotted. The client then looped through this data set and used Google Maps API calls to create a marker on the base map corresponding to each point. \section{Results} \label{sec-results} As noted in the introduction, the intent of these experiments was not to do a full analysis and statistical comparison of the performance of the different techniques, but rather to identify broad trends. We have not, therefore, carried out any statistical analysis on the results. We will now discuss the results for data size, page load time and memory usage. Because the data set size increases by powers of two, we have used log-log scales for all plots. \subsection{Data size} The size of the data generated for each technique is shown in Figure~\ref{fig-data-size}. \begin{figure} \begin{center} \includegraphics[scale=0.66]{data_size} \end{center} \caption{Comparison of generated data size for each technique (log-log scale).} \label{fig-data-size} \end{figure} \subsection{Page load time} The size of the data generated for each technique is shown in Figure~\ref{fig-data-size}. \begin{figure} \begin{center} \includegraphics[scale=0.66]{data_generation_time} \end{center} \caption{Comparison of data generation time for each technique (log-log scale).} \label{fig-data-generation-time} \end{figure} \begin{figure} \begin{center} \includegraphics[scale=0.66]{page_load_time} \end{center} \caption{Comparison of map display time for each technique (log-log scale).} \label{fig-page-load-time} \end{figure} \begin{figure} \begin{center} \includegraphics[scale=0.66]{combined_time} \end{center} \caption{Comparison of combined page load time for each technique (log-log scale).} \label{fig-combined-time} \end{figure} \subsection{Memory usage} We measured both the real and virtual memory usage of the browser by running the \texttt{top} utility after each test run and observing the memory usage in each category. This told us the size of both the current ``working set'' and the total memory footprint of the browser process after it had completed a test run. The real memory data proved somewhat unreliable, however. Real memory usage was generally consistent across several test runs, but frequently it would fluctuate upwards by a factor of nearly two for no readily apparent reason. We can only assume that this was a result of other processes running on the test machine interacting with the browser process in unexpected ways. We therefore discarded the real memory data. The virtual memory data proved more consistent, as the virtual memory footprint of a process is less likely to be impacted by other running processes. The amount of virtual memory used by the browser for each technique is shown in Figure~\ref{fig-virtual-memory}. \begin{figure} \begin{center} \includegraphics[scale=0.66]{virtual_memory} \end{center} \caption{Comparison of virtual memory usage for each technique (log-log scale).} \label{fig-virtual-memory} \end{figure} \section{Conclusion} % The % software extracts IP addresses from the web server logs, geolocates them % using the free MaxMind GeoLite Country database\footnote{See % \url{http://www.maxmind.com/app/ip-location}.}, then stores the % resulting country information in a separate database. % The Tasmania software, however, uses countries as its base unit of % aggregation. We were interested in looking at the distribution on a finer % level, down to individual cities if possible \bibliography{Map_Visualisation} \begin{received} ... \end{received} \end{document}
Show line notes below