diff --git a/Atom_updates.tex b/Atom_updates.tex
new file mode 100755
index 0000000..665faff
--- /dev/null
+++ b/Atom_updates.tex
@@ -0,0 +1,523 @@
+\documentclass{CRPITStyle}
+
+\usepackage{harvard}
+\usepackage{graphicx}
+
+\pagestyle{empty}
+\thispagestyle{empty}
+
+\title{Lightweight Update Propagation using Atom}
+\author{David W.\ Williamson \and Nigel J.\ Stanger}
+\affiliation{Department of Information Science, University of Otago, \\
+	PO Box 56, Dunedin, New Zealand \\
+	Email:~\texttt{\{dwilliamson,nstanger\}@infoscience.otago.ac.nz}}
+
+
+\begin{document}
+
+\maketitle
+
+\begin{abstract}
+There are many situations where some form of automated update
+propagation across disparate databases may be beneficial. For example, a
+retailer could automatically retrieve the latest pricing data from their
+suppliers' databases, and use these data to update their own internal
+database. Doing so at regular intervals ensures that the retailer always
+has current pricing information in their database. Electronic Data
+Integration (EDI) tools that provide such features already exist but can
+be expensive to implement, particularly for small to medium enterprises
+(SME's). In this paper we propose a lightweight approach for propagating
+updates from one database to another using the Atom XML syndication
+format, thus providing a simpler, cost-effective technology for
+facilitating data integration. This approach enables a target database
+to regularly poll one or more source databases for updates, which are
+then applied to the target database (alternatively, updates could be
+``pushed'' to the target from the sources). This approach can be used in
+typical data integration scenarios where the data sources are updated at
+irregular intervals, such as the aforementioned retailer example, or
+when extracting data from multiple data sources for loading into a data
+warehouse. In the paper we discuss the underlying principles and
+motivation for the approach, describe the architecture that we have
+used, and describe an early prototype implementation.
+\end{abstract}
+\vspace{.1in}
+
+\noindent {\em Keywords:} update propagation, data integration, Atom,
+SME, lightweight architecture, Semantic Web, B2B
+
+\section{Introduction}
+
+The ability to integrate data from multiple heterogeneous sources is
+becoming a key issue for modern businesses, and yet the number of
+businesses implementing data integration solutions is smaller than we
+might expect [2,20]. This is particularly true for small to medium
+enterprises (SME's), for whom the cost of implementing an
+enterprise-scale data integration solution can often be prohibitive
+[2,8,18].
+
+In this paper, we propose a lightweight data integration architecture
+based on the Atom XML syndication format, which may provide a
+cost-effective alternative technology for SME's to facilitate data
+integration rather than having to purchase expensive enterprise grade
+systems. We are currently implementing a basic proof of concept of this
+architecture, and plan to evaluate it using three case studies.
+
+The body of this paper comprises three main sections. In Section 2 we
+provide some general background information regarding data integration
+and the Atom syndication format. In Section 3 we discuss the motivation
+behind our proposed architecture. We then discuss the proposed
+architecture and the goals of our research in Section 4, and present
+some possible directions for future work in Section 5. The paper
+concludes in Section 6.
+
+\section{Background}
+
+In this section, we briefly discuss the concepts and technologies that
+underlie our proposed architecture. In Section 2.1 we provide a brief
+overview of data integration, especially in the context of SME's
+attempting to implement a data integration solution. This is followed by
+a brief discussion of the development of Atom and related technologies
+such as RSS and RDF.
+
+\subsection{Data Integration}
+
+Data integration is a term used to describe the combining of data
+residing in different sources to provide the user with a unified view of
+data [1,22]. This activity is becoming increasingly important to modern
+business operation as more and more organizations rely upon applications
+that support staff in undertaking informed decision making [6,22].
+
+Data integration is a domain that has been a topic of research for some
+time [2,21]; today this domain is of no less significance with many
+organizations requiring the aggregation of data from multiple and often
+heterogeneous sources, for a wide variety of applications [9]. Batini
+et. al. [1] illustrated three common scenarios for integration
+environments:
+
+\begin{itemize}
+
+	\item homogeneous, where all the sources of data share the same
+	schema;
+
+	\item heterogeneous, where data must be integrated from sources that
+	may use different schemas or platforms (e.g., a combination of
+	relational and hierarchical databases); and
+
+	\item federated, where integration is facilitated by the use of a
+	common export schema over all data sources.
+
+\end{itemize}
+
+A typical example of data integration from heterogeneous sources can be
+found in the arena of business-to-business (B2B) commerce, where, for
+example, a manufacturer may have to interact with multiple suppliers or
+temporary contractors each of whom may have completely different data
+structures and data exchange formats [19]. With the introduction of
+cheaper web based technology, many additional organizations have been
+able to undertake projects to facilitate data integration, however, the
+costs associated with such technology are still quite prohibitive to the
+many smaller companies and organizations that comprise the majority of
+most countries' economies.
+
+Many initiatives have been put forward to try and alleviate this
+situation, one of the more recent being the OASIS Universal Business
+Language (UBL) standard [14], which is a project to standardize common
+business documentation---invoices, purchase orders etc.---so that it is
+easier for companies to establish and maintain automated transactions
+with other parties. UBL has been designed to operate with ebXML.
+
+XML has been widely adopted as a standard platform for exchanging data
+between organizations, and many specialist standards---such as the
+aforementioned ebXML---have been developed to cater to the unique needs
+certain business sectors present. In addition to XML-based language
+specifications, other standards such as EDIFACT  and EXPRESS have been
+defined to facilitate the transmission of information from various
+sources so that it may be integrated with other data.
+
+\subsection{The Atom Syndication Format}
+
+In this section we provide a brief overview of the Atom syndication
+format and the technologies that led to its development.
+
+\subsubsection{RDF, RSS and the Semantic Web}
+
+The World Wide Web (WWW) as it stands today consists mostly of documents
+intended for humans to read, i.e., ``\ldots{}a medium of documents for people
+rather than for data and information that can be processed
+automatically\ldots'' [5], which provides minimal opportunity for computers to
+perform additional interpretation or processing on them [3,5]. In
+essence, computers in use on the Web today are primarily concerned with
+the parsing of elementary layout information, for example headers,
+graphics or text and processing like user input forms [4,5].
+
+There are few means by which computers can perform more powerful
+processing or manipulation on web resources [5,7], most often because
+the additional semantics required do not exist or are not in a form that
+can be interpreted by computers [11]. The motivation for the adoption of
+semantics in Web documents can be made evident simply by using a
+contemporary search engine to look for an ``address''. This search may
+well return a plethora of results ranging from street addresses and
+email addresses to public addresses made by important individuals
+through the ages.
+
+This kind of scenario is one of the reasons for the W3C's Semantic Web
+project [11]. In the words of its creator, Tim Berners-Lee, its goal is
+to:
+
+\begin{quotation}
+	``\ldots{}develop enabling standards and technologies designed to help
+	machines understand more information on the Web so that they can
+	support richer discovery, data integration, navigation, and
+	automation of tasks. With Semantic Web we not only receive more
+	exact results when searching for information, but also know when we
+	can integrate information from different sources, know what
+	information to compare, and can provide all kinds of automated
+	services in different domains from future home and digital libraries
+	to electronic business and health services.'' [11]
+\end{quotation}
+
+In other words, the Semantic Web will provide a space where more
+intelligent searching and processing of information will be made
+possible by further extending the existing capabilities of the World
+Wide Web (WWW).
+
+RDF is a technology that is an integral part of the W3C Semantic Web
+initiative, as the following excerpt from the W3C Semantic Web activity
+statement will attest:
+
+\begin{quotation}
+	``The Resource Description Framework (RDF) is a language designed to
+	support the Semantic Web, in much the same way that HTML is the
+	language that helped initiate the original Web. RDF is a frame work
+	for supporting resource description, or metadata (data about data),
+	for the Web. RDF provides common structure that can be used for
+	interoperable XML data exchange.'' [17]
+\end{quotation}
+
+What RDF does in the context of the Semantic Web is to provide the
+capability of recording data in a way that can be interpreted easily by
+machines, which in turn provides an avenue to ``\ldots{}more efficient and
+sophisticated data interchange, searching, cataloguing, navigation,
+classification and so on\ldots{}'' [17].
+
+Since its inception in the late 1990's, the RDF specification has
+spawned several applications, RSS being but one example. RDF Site
+Summary (RSS) is an XML application, of which versions 0.9 and 1.0
+conform to the W3C's RDF specification. It is a format intended for
+metadata description and content syndication [12]. Originally developed
+by Netscape as a means to syndicate content from multiple sources onto
+one page [16], RSS has been embraced by other individuals and
+organizations resulting in the spawning of multiple versions.
+
+At its most simple, the information provided in an RSS document
+comprises the description of a ``channel'' (that could be on a specific
+topic such as current events, sport or the weather, etc.) consisting of
+URL linked items. Each item consists of a title, a link to the actual
+content and a brief description or abstract.
+
+Because of the proliferation of differing RSS standards and associated
+problems with compatibility, a group of service providers, vendors and
+developers have initiated the development of a separate syndication
+standard named Atom, which will, according to the Atom Publishing Format
+and Protocol (Atompub) Working Group, be heavily influenced by the
+lessons learned in the evolution of RSS.
+
+\subsubsection{Atom}
+
+The Atom  specification is an XML-based document format that has been
+designed to describe lists of related information [16]. These lists are
+known as ``feeds''. Feeds are made up of multiple items, known as
+``entries''; each entry can have an extensible set of attached metadata
+[16].
+
+Atom as a technology comprises four key related components: a conceptual
+model of a resource, a well defined syntax for this model, the actual
+atom feed format itself and the editing protocol. Both the feed format
+and editing protocol also make use of the aforementioned syntax.
+
+In addition to these features, the Atompub Working Group have outlined
+several design objectives for the feed format and the editing protocol.
+The feed format must be able to represent the following: a resource that
+is a weblog entry or article, a feed or channel of entries, a complete
+archive of all entries within a feed, existing well formed XML
+(especially XHTML) content and additional information in a
+user-extensible manner.
+
+The editing protocol must support creating, deleting or editing feed
+entries, multiple authors for a single feed, user authentication, user
+management and the ability to create, obtain and configure complementary
+material such as comments or templates.
+
+The latest specification of Atom, which at the time of writing is still
+in a draft form, states the main purpose that Atom is intended to
+address is ``\ldots{}the syndication of Web content such as Weblogs and news
+headlines to Web sites as well as directly to user agents'' [16]. The
+specification also suggests that Atom should not be limited to just web
+based content syndication but in fact may be adapted for other uses or
+content types. The Atompub Working Group aim to submit the Atom feed
+format and editing protocol to the IETF for consideration as a proposed
+standard in early April 2005.
+
+\section{Motivation}
+
+One of the example domains of data integration is that of Electronic
+Data Interchange (EDI), a concept used by companies to exchange
+information such as goods procurement documentation. EDI is not new
+[2,15], and has been used for many years by various organizations to
+reduce costs by replacing more traditional paper based systems. It is
+interesting to note, however, that in surveys regarding the extent of
+adoption of EDI, only a fraction of the companies that might be
+perceived as beneficiaries of such technology have actually implemented
+or attempted to implement it [2,20]. This naturally raises the question
+of why? We can refine this question further by asking why so few smaller
+companies (SME's) have adopted EDI or indeed other technologies that
+rely on accurate automated data integration, such as data warehousing.
+
+Perhaps the most important reason is that of cost: to a small company
+the perceived benefits of introducing the technology may not be
+sufficient to justify the expense [2,8,18]. When a decision has been
+made to implement new technology, it is often the case that the SME in
+question has been forced into an investment that is, to them, an
+expensive solution, perhaps due to demands imposed by larger clients and
+partners, or as a response to competitors in an attempt to maintain
+market position [2,20].
+
+Attempts have been made to make EDI more cost effective by introducing
+EDI on a web-based platform [2], and through the development of
+standards such as the recently sanctioned OASIS Universal Business
+Language (UBL) standard [14]. While UBL is new and has probably not had
+sufficient time to make a substantial impact, the fact remains that the
+underlying reason these types of technologies are still not attractive
+enough to SME's is cost [2,8,18,20].
+
+To summarize, data integration related technologies are often not
+readily or willingly implemented by SME's because of the perceived high
+costs involved, and at best are implemented only if it is deemed vitally
+important to the continued survival of the organization in the
+marketplace.
+
+Such a situation leads us to the conclusion that there is an apparent
+need for an alternative data integration solution that is cost
+effective, enabling SME's to embrace the benefits of applications that
+use data integration technologies, such as data warehousing, EDI
+networks or e-catalogues.
+
+This identified need provides the motivation for our proposed
+architecture, which we will discuss in the next section.
+
+\section{Proposed Architecture and Research Goals}
+
+To address the issue of lack of SME adoption of data integration
+technologies, we propose a lightweight data integration architecture
+based on Atom, as illustrated in Figure 1. Atom was chosen as the
+underlying technology because of its XML heritage, and because the Atom
+community is trying to encourage different uses for the format beyond
+the traditional application of weblog syndication [16]. Although the
+standard has yet to be officially ratified, it already has a large user
+and development community.
+
+We are currently implementing a basic proof of concept of this
+architecture, and will evaluate its cost-effectiveness and performance
+compared to other data integration technologies. The prototype builds
+upon existing software available for processing Atom feeds, and adds a
+module (written in PHP) for integrating incoming data from different
+feeds.
+
+The integration module takes as input Atom feeds from multiple data
+sources, which simulate incoming data from client or supplier data sets.
+(For the initial prototype we have assumed that the data feeds are
+homogeneous; obviously this will need to be extended to heterogeneous
+feeds in later versions.) After the Atom feeds have been collected, the
+integration module will integrate the data supplied by the feeds into a
+schema that matches that of the target database, as shown in Figure 1. A
+transaction simulator will be employed to simulate workload and updates
+to the source databases, in order to recreate a day-to-day production
+environment.
+
+In order to evaluate the prototype, we will implement three different
+simulated scenarios derived from actual use cases of previous projects.
+All three case studies follow a similar structure whereby data will be
+exported as Atom feeds from the source database(s), which are then
+consumed by the integration module before being sent to the target
+database for insertion.
+
+The first scenario will simulate the integration of product data from
+multiple suppliers into a vendor's product information database. The
+product information database is used to populate the vendor's online
+product catalogue, which clients use to make decisions regarding goods
+procurement. The Atom feeds in this scenario represent flows of product
+data from the supplier to the vendor.
+
+The second scenario follows on from an earlier research project to
+develop a kiosk system for the sale and distribution of music in digital
+format. The database the kiosk(s) use will be populated with information
+from vendors who have agreed to supply content (e.g., a record label's
+collection of music files). What is needed is a mechanism to integrate
+all the music data from each supplier into the music kiosk system's own
+database. The Atom feeds in this scenario are used to maintain an up to
+date database that has the location and description of each available
+music track for sale in the system.
+
+The third scenario will simulate the implementation of a data
+warehousing solution for a computer components distributor.
+
+Preliminary results from the case study evaluations are expected to be
+available by June 2005. Our primary goal with the initial prototype is
+to prove the feasibility of our approach. We will compare our proposed
+architecture against existing data integration solutions by means of a
+cost/benefit analysis. We may also investigate measuring various
+software quality characteristics as defined by the ISO 9126 standard
+[10].
+
+%   Figure 1. Proposed architecture showing integration module
+
+\section{Future Work}
+
+As the initial prototype is intended as a basic proof of concept of our
+proposed architecture, it has been kept as simple as possible in order
+to facilitate the implementation and evaluation. There are several
+obvious extensions to the basic prototype that will be investigated in
+later iterations of the architecture.
+
+The initial prototype assumes that all data sources are largely
+homogeneous, that is, that they all share similar semantics and can
+therefore be relatively easily integrated. An obvious extension is to
+permit heterogeneous data sources that have differing semantics. Such an
+extension would require the addition of an ontology management module
+between the Atom feed processor and the integration module. This module
+will probably be based around the W3C's Web Ontology Language (OWL)
+[13].
+
+The initial prototype also assumes only a single ``author'' per Atom feed,
+that is, there is only a single database underlying each feed (as
+implied by Figure 1). We can envisage a situation where what appears to
+be a single data source is actually a view layered on top of a
+collection of underlying databases (e.g., a supplier might draw data for
+their Atom feed from multiple databases within their organization). It
+would therefore be useful to investigate the possibility of multiple
+``authors'' per Atom feed. This could imply an additional layer of data
+integration within the data source itself.
+
+The data flows shown in Figure 1 imply that the proposed architecture is
+one-way only (i.e., from the data sources to the target database), but
+this may not be true in general. It would therefore be interesting to
+investigate extending the architecture to allow for the possibility of
+two-way data transfers, i.e., allowing data to flow from the target back
+to the sources.
+
+\section{Conclusion}
+
+In this paper, we discussed a lightweight data integration architecture
+based on the Atom XML syndication format. Cost is a major factor in the
+slow adoption of data integration technologies by small to medium
+enterprises, so the proposed architecture could provide a cost-effective
+alternative for implementing data integration infrastructures in small
+business environments. We are currently developing a basic
+proof-of-concept prototype system that will be evaluated using a series
+of realistic case studies. We expect to have preliminary results from
+these evaluations by June 2005.
+
+\section{Acknowledgements}
+
+The authors would like to thank Dr. Colin Aldridge and Dr. Stephen
+Cranefield for their helpful comments on an early draft of this paper.
+
+
+\section{References}
+
+[1]	Batini, C., Lenzerini, M., and Navathe, S. B. (1987). A
+comparative analysis of methodologies for database schema integration.
+ACM Computing Surveys, 18, 4 (Dec. 1986), 323--364.
+
+[2]	Beck, R., Weitzel, T., and K\"{o}nig, W. (2002). Promises and
+pitfalls of SME integration. In Proceedings of the 15th Bled Electronic
+Commerce Conference (Bled, Slovenia, June 17--19, 2002). 2002.
+
+[3]	Berners-Lee, T., and Fischetti, M. Weaving the Web. Orion
+Business, London, 1999.
+
+[4]	Berners-Lee, T., Connolly, D., and Swick, R. R. (1999) Web
+Architecture: Describing and Exchanging Data. W3C Note, World Wide Web
+Consortium, 7 June 1999. http://www.w3c.org/1999/04/WebData
+
+[5]	Berners-Lee, T., Hendler, J., and Lassila, O. The Semantic Web.
+Scientific American, 284, 5 (May 2001), 34--43.
+
+[6]	Calvanese, D., De Giacomo, G., Lenzerini, M., Nardi, D., and
+Rosati, R. Information integration: Conceptual modeling and reasoning
+support. In Proceedings of the 3rd IFCIS International Conference on
+Cooperative Information Systems (CoopIS'98) (New York, NY, August 20--22,
+1998). IEEE Computer Society Press, Los Alamitos, CA, 1998, 280--291.
+
+[7]	Fensel, D., Hendler, J., Lieberman, H., and Wahlster, W. (Eds.)
+Spinning the Semantic Web. MIT Press, Cambridge, MA, 2003.
+
+[8]	Guo, J., and Sun, C. Context representation, transformation and
+comparison of ad hoc product data exchange. In Proceedings of the 2003
+ACM Symposium on Document Engineering (DocEng '03) (Grenoble, France,
+November 20--22, 2003). ACM Press, New York, NY, 2003, 121--130.
+
+[9]	Haas, L. M., Miller, R. J., Niswonger, B., Tork Roth, M.,
+Schwarz, P. M., and Wimmers, E. L. Transforming heterogeneous data with
+database middleware: Beyond integration. IEEE Data Engineering Bulletin,
+22, 1 (Mar. 1999), 31--36.
+
+[10]	ISO. Software Engineering---Product Quality---Part 1: Quality Model.
+Standard ISO/IEC 9126-1:2001, International Organization for
+Standardization, Geneva, Switzerland, 2001.
+
+[11]	Koivunen, M., and Miller, E. W3C Semantic Web activity. In
+Semantic Web Kick-Off in Finland: Vision, Technologies, Research, and
+Applications (Helsinki, Finland, November 2, 2001). HIIT Publications,
+Helsinki, Finland, 2002, 27--43.
+
+[12]	Manola, F., Miller, E., and McBride, B. RDF Primer. W3C
+Recommendation, World Wide Web Consortium, 10 February 2004.
+http://www.w3.org/TR/rdf-primer/
+
+[13]	McGuinness, D. L., and van Harmelen, F. OWL Web Ontology
+Language: Overview. W3C Recommendation, World Wide Web Consortium, 10
+February 2004. http://www.w3.org/TR/2004/REC-owl-features-20040210/
+
+[14]	Meadows, B., and Seaburg, L. Universal Business Language 1.0.
+OASIS Committee Draft cd-UBL-1.0, Organization for the Advancement of
+Structured Information Standards, Billerica, MA, 15 September 2004
+http://docs.oasis-open.org/ubl/cd-UBL-1.0/
+
+[15]	Medjahed, B., Benatallah, B., Bouguettaya, A., Ngu, H. H. A.,
+and Elmagarmid, A. K. Business-to-business interactions: Issues and
+enabling technologies. The VLDB Journal, 12, 1, (May 2003), 59--85.
+
+[16]	Nottingham, M., and Sayre, R. The Atom Syndication Format. IETF
+Internet-Draft draft-ietf-atompub-format-06, Internet Engineering Task
+Force, 12 March 2005.
+http://www.ietf.org/internet-drafts/draft-ietf-atompub-format-06.txt
+
+[17]	Powers, S. Practical RDF. O'Reilly \& Associates, Sebastopol, CA,
+2003.
+
+[18]	Sommer, R. A., Gulledge, T. R., and Bailey, D. The n-tier hub
+technology. ACM SIGMOD Record, 31, 1 (Mar. 2002), 18--23.
+
+[19]	Stonebraker, M., and Hellerstien, J. M. Content integration for
+E-Business. In Proceedings of the 2001 ACM SIGMOD International
+Conference on Management of Data (SIGMOD '01) (Santa Barbara, CA, May
+21--24, 2001). ACM Press, New York, NY, 2001, 552--560.
+
+[20]	van Heck, E., and Ribbers, P. M. The adoption and impact of EDI
+in Dutch SME's. In Proceedings of the 32nd Hawaii International
+Conference on System Sciences (HICSS-32) (Maui, Hawaii, January 5--8,
+1999). IEEE Computer Society Press, Los Alamitos, CA, 1999, 7061.
+
+[21]	Wiederhold, G. Intelligent integration of information. In
+Proceedings of the 1993 ACM SIGMOD International Conference on
+Management of Data (SIGMOD '93) (Washington, D. C., May 26--28, 1993).
+ACM Press, New York, NY, 1993, 434--437.
+
+[22]	Yu, C., and Popa, L. Constraint-based XML query rewriting for
+data integration. In Proceedings of the 2004 ACM SIGMOD International
+Conference on Management of Data (SIGMOD '04) (Paris, France, June
+13--18, 2004). ACM Press, New York, NY, 2004, 371--382.
+
+
+\end{document}