Newer
Older
Publications / OCLC.tex
nstanger on 28 Sep 2006 22 KB - Updated charts with new data.
\documentclass[12pt,pdftex,a4paper,titlepage]{article}


\usepackage[T1]{fontenc}
\usepackage{textcomp}
\usepackage{lmodern}
\usepackage{graphicx}
\usepackage[margin=1in]{geometry}
\usepackage{pifont}
\usepackage[dcucite]{harvard}
\usepackage{url}


\title{Building institutional repositories on a shoestring}
\author{Nigel Stanger\thanks{\protect\url{nstanger@infoscience.otago.ac.nz}} \and Graham McGregor\thanks{\protect\url{gmcgregor@infoscience.otago.ac.nz}}}
\date{University of Otago, PO Box 56, Dunedin 9054, New Zealand}


\renewcommand{\thetable}{\Roman{table}}


\begin{document}


\maketitle


\bibliographystyle{dcu}

\begin{abstract}

\noindent Case study \\

\noindent\textbf{Purpose} --- To share the authors' experiences in implementing three different digital repositories, and to demonstrate the relative ease with which a digital institutional repository can be implemented.

\noindent\textbf{Design/methodology/approach} --- Three separate repositories were implemented at the University of Otago over a six month period, each with a different focus and different needs. The first repository was implemented as a pilot, and is now a fully-functional system. The other two repositories were implemented from the ground up as fully-functional systems.

\noindent\textbf{Findings} --- Implementing an effective digital repository requires surprisingly few resources. The first Otago repository was implemented by a team of five people in only ten days, running on cheap commodity hardware and free open source software.

\noindent\textbf{Practical implications} --- The repositories described in this paper were implemented extremely quickly and were very successful. This implies that digital repository software tools have matured to a sufficient state that almost anyone can quickly and easily implement a full-featured repository.

\noindent\textbf{Originality/value} --- The three repositories discussed in this paper provide excellent exemplars in three quite different contexts: ``traditional'' academia, indigenous studies and community history.
\end{abstract}


\section{Introduction}

Digital institutional repositories have become a hot topic over the last few years, and many institutions around the world are now actively working towards implementing them. In this article we discuss how low cost yet fully functional digital institutional repositories (IRs) can be set up in a very short time frame. We reflect on the lessons learned while implementing three different repositories at the University of Otago, and discuss some new and exciting applications of digital repositories arising from these. We also suggest some best practices for implementing an IR and discuss issues that must be considered when moving from a small-scale pilot implementation to a full roll-out.

Interest in institutional repositories at the University of Otago was sparked by the release of the New Zealand Digital Strategy by the New Zealand government in May 2005. The strategy aims to ensure that ``New Zealand is a world leader in using information and technology to realize our economic, environmental, social and cultural goals'' \cite{NZG-2005-digital_strategy}. In parallel with this, the National Library of New Zealand set up an expert working party with representatives from across the research sector to investigate the feasibility of establishing a national institutional repository for New Zealand's research outputs \cite{Rank-J-2005-feasibility}. The National Library is fostering a work program to improve access to New Zealand's research outputs, by collaborating with institutions to stimulate the set-up of research repositories.

In May 2005, two senior University of Otago staff undertook a study tour of Digital Challenges facing universities in the United States. Their report provided the impetus for the first IR pilot in Otago's School of Business. Project work began on 7 November 2005, with the following goals \cite{Stan-N-2006-running}:
\begin{itemize}

	\item To establish a proof of concept demonstrator for storing and providing open access to digital research publications in the School of Business.

	\item To evaluate the potential of the demonstrator for adoption by the wider Otago University research community.

	\item To connect the School of Business with the global research community, in line with the feasibility study and recommended actions for a national repositories framework for New Zealand's research outputs \cite{Rank-J-2005-feasibility}.

\end{itemize}

The remainder of the paper discusses how three different repositories were rapidly implemented at the University of Otago, and the subsequent impact of these repositories. Issues that arose during implementation and possible solutions are discussed, then the process of rolling out to the wider institution is examined. The paper concludes with a look ahead to future developments arising from the Otago repository project.


\section{The pilot repository}

The GNU EPrints repository management software was chosen for the pilot repository because it was widely used, well-supported, inexpensive and would not lock the School into specific technologies or vendors \cite{Sale-A-2005-NZIRW}. The development team also had prior experience with the software. A rapid prototyping methodology was adopted, emphasizing quick releases of visible results with multiple iterations, in order to create interest in the project at an early stage, and enable a positive feedback cycle. A sandbox was used to test entries and entry formats before the material went live.

The pilot repository was fully implemented within ten days of assembling the project team, with most of this time spent tweaking the look and feel of the web site and collecting content \cite{Stan-N-2006-running}. This outcome was possible because of the establishment of a very clear brief to ``prove the concept'', rather than taking on a large scale project that would involve many different disciplines, researchers and research outputs from the outset. Early decisions were made to restrict the content and content domain used for the pilot, in order to speed the collection process and minimize requirements ``creep''. Meetings were kept to a minimum and policy and procedural issues that required institutional decisions were noted as work progressed, rather than tackled head on. The project was widely publicized within the School and Heads of Departments were consulted to ensure top-level buy-in. This approach produced immediate results and the repository was quickly populated with a range of working/discussion papers, conference items, journal articles and theses.

There was no cost associated with the GNU EPrints software community, and from a technical point of view the project was wonderfully straightforward. The School of Business repository\footnote{\url{http://eprints.otago.ac.nz/}} was deployed on a spare mid-range server running FreeBSD, which meant that hardware and software costs were essentially nil. In other words, if you happen to have some spare hardware lying around, you can set up an initial repository very cheaply, and expand it later.

A minimalist approach was taken with regard to gathering content; partly because of the prototypical nature of the project, and partly because material in the hand is worth more than promises by authors to supply content at some indeterminate future date. New publications are always being created, and content acquisition is a moving target that has to be effectively managed. Once some basic content acquisition and data entry protocols were put in place, an incremental methodology was adopted. Content was limited to voluntary contributions in PDF format from colleagues in the School of Business, but with no constraint on the type of output. At the end of September 2006, the repository contains 327 documents covering a wide range of topics and document types, with new content being continually acquired.

It is remarkable what can be achieved by a small, dedicated, knowledgeable and enthusiastic implementation team. As with any project, the right mix of technical and project management skills is crucial in making things happen. The project team comprised the School's Research Development Coordinator (project management and evangelism), an Information Science lecturer (software implementation), the School's IT manager (hardware and deployment) and two senior students (research, content acquisition and data entry). Oversight was provided by a standing committee comprising representatives from Information Technology Services, the University Library and the School of Business.


\section{Impact of the pilot}

The initial response to the pilot repository seemed spectacular, with over 18,000 downloads recorded within the first three months, from a very wide range of different countries. This was considerably more than several similar repositories elsewhere in the world and excited considerable interest from both inside and outside the University. It was therefore somewhat disheartening to discover in April 2006 that the hit rates were in fact over-inflated by a factor of about five. This was due to an undocumented assumption in the Tasmania statistics generating software \cite{Sale-A-2006-stats} that resulted in hits being counted more than once if the statistics were gathered more often than once per day. (The lesson here is to always be wary of computers bearing wonderful news!)

Despite this, hit rates for the Otago repository are still healthy, as shown in Figure~\ref{fig-otago-growth}. Interestingly, the repository attracts many more abstract views than downloads. An informal analysis of hit rates across several other repositories around the world for which statistics were available, shows that some have more abstract views than downloads, while others have more downloads than abstract views. Further investigation is needed to determine why this variation occurs.


\begin{figure}
	\centering
	\includegraphics[scale=0.79]{otago_growth}
	\caption{Total monthly hit rates (bar chart, left axis) and number of items (line chart, right axis) for the Otago School of Business repository, up to September 2006.}
	\label{fig-otago-growth}
\end{figure}


Of particular interest is that hit rates on the Otago repository have grown at a much more rapid rate than the other repositories that were examined, especially when time since launch is considered. As can be seen in Figure~\ref{fig-growth-comparison}, total traffic to the Otago repository has grown much more rapidly during its early months than for any of the other eight repositories shown, including some that are much older and larger (see Table~\ref{tab-reopsitories}). Clearly this is a very interesting development, and a research project is currently underway to investigate possible reasons for this.

% hit rates ... appear to have grown more rapidly than other the repositories ...
% This may simply be a consequence of growing public awareness of digital repositories, or it may be a consequence of other factors. A research project is currently underway to investigate possible reasons for the Otago repository's rapid traffic growth.


\begin{figure}
	\centering
	\includegraphics[scale=0.8]{growth_comparison}
	\caption{Comparison of traffic growth across nine EPrints repositories, as of September 2006. (The different line styles are used to only to distinguish the lines; they have no other significance.)}
	\label{fig-growth-comparison}
\end{figure}


\begin{table}
	\caption{Details of repositories compared in Figure~\ref{fig-growth-comparison}.}
	\label{tab-reopsitories}
	\begin{center}
		\begin{tabular}{lrrl}
													&	\textbf{Age in}	&	\textbf{Num.}	\\
			\textbf{Repository}						&	\textbf{months}	&	\textbf{items}	\\
			\hline
			dLIST (U.\ Arizona)						&	17				&	735	\\
			E-LIS (CILEA, Italy)					&	45.5			&	4275	\\
			U.\ Melbourne							&	50.5			&	1279	\\
			U.\ Nottingham							&	38.5			&	235	\\
			U.\ Otago/Cardrona						&	4.5				&	13	\\
			\textbf{U.\ Otago/School of Business}	&	\textbf{10.5}	&	\textbf{327}	\\
			U.\ Otago/Te Tumu						&	5				&	30	\\
			Rhodes U. (S.\ Africa)					&	18.5			&	248	\\
			U.\ Tasmania							&	24				&	301	\\
		\end{tabular}
	\end{center}
\end{table}


One exciting outcome of the implementation has been the ability to make  available material that might otherwise be difficult or impossible to access, and thus increase the likelihood of it being cited \cite{Harn-S-2005-research,Hajj-C-2005-citation}. For example, Figure~\ref{fig-item-types} shows that about two thirds of the items in the Otago repository are items that might not otherwise be readily accessible, such as theses, dissertations, and various working and discussion papers. Indeed, the top ten downloaded items as of September 2006 comprise five departmental working papers, two conference papers, one journal paper, one thesis and one research report, in essentially that order.


\begin{figure}
	\centering
	\includegraphics[scale=0.8]{otago_items}
	\caption{Types of item in the Otago School of Business repository, September 2006.}
	\label{fig-item-types}
\end{figure}


The pilot implementation was clearly a success. Indeed, it was so successful that the ``pilot'' status was dropped in mid-May 2006 (only six months after going live), and it is now the official repository for Otago's School of Business. The lessons learned from this experience have since been applied to two other repositories at Otago.


\section{The second repository: EPrints Te Tumu}

The success of the pilot excited considerable interest throughout the University community. In early 2006, Te Tumu, Otago's School of M\={a}ori, Pacific \& Indigenous Studies, expressed an interest in implementing a repository for their specific needs. They were particularly interested in the use of a digital repository as a means of disseminating their research and other work, as there are relatively few ``official'' outlets for their discipline. In addition to the usual articles and papers that are found in most typical IRs, Te Tumu also wished to store multimedia items such as images of traditional crafts and artwork, and video clips of songs and dances. This was simply a matter of adding  appropriate item types to the EPrints metadata configuration and creating corresponding templates.

Drawing on experience from the pilot, the Te Tumu repository\footnote{\url{http://eprintstetumu.otago.ac.nz/}} was implemented by a single person in about a month, and was officially launched on May 3 2006, making it the first repository for indigenous studies in New Zealand (and possibly the world). Response to the repository has been very positive, with over 2,600 downloads from 49 different countries during its first five months. The repository currently contains 30 items, including articles, theses, images and video clips.


\section{Issues to consider}


\subsection{Copyright}

Copyright is a potentially thorny issue for any IR, although many of the concerns raised often turn out to be perceived rather than actual problems \cite{EPri-O-2005-SelfFAQ}. Much of the material loaded into the Otago repositories comprised departmental working or discussion papers, for which permission to publish online had already been granted. Items with uncertain copyright status had full text access restricted until their status was confirmed. The SHERPA website\footnote{\url{http://www.sherpa.ac.uk/}} was a valuable resource for ascertaining journal copyright agreements.


\subsection{Data standards}

New Zealand's Digital Strategy proposes the long term goal of linking all New Zealand repositories to share information and avoid isolated ``silos of knowledge'', where each institution has little idea of what is happening elsewhere \cite{NZG-2005-digital_strategy}. It is therefore imperative that open standards such as the Dublin Core Metadata Initiative\footnote{\url{http://www.dublincore.org/}} be applied for both data and metadata. Dublin Core is natively supported by EPrints, and also by many library cataloging systems.


\subsection{Data entry}

Data entry may often be carried out by people who are not specifically trained for the task (for example, document authors), so it is essential to have well-defined and widely publicized processes and standards for data entry. EPrints allows the data entry process to be heavily customized to the needs of an individual repository. In addition, a final editorial verification is essential to check the quality of the data entered and to ensure that the item is suitable for inclusion in the repository.


\subsection{Content acquisition}
\label{sec-content}

The key issue regarding acquisition of material is whether self-archiving should be compulsory (top-down) or voluntary (bottom-up). \citeasnoun{Sale-A-2005-NZIRW} argues that a compulsory policy is much more effective at increasing the size of a repository, as illustrated by the growth rates of repositories at the Queensland University of Technology (compulsory, high growth) and the University of Queensland (voluntary, low growth). Compulsory archiving policies are often driven by the need to capture information for research evaluation and funding purposes, but run the risk that authors may react negatively to such a requirement. \citeasnoun{Swan-A-2004-OA} surveyed 157 authors who did not self-archive and found that 69\% of them would willingly deposit their articles in an open repository if required to do so.

Another issue is when authors should submit new content to a repository. In particular, should pre-prints of submitted papers be immediately placed in the repository, or should the authors wait until the paper has been accepted for publication? There are valid arguments for both positions, but in the case of popular repositories, waiting for acceptance may be a ``safer'' option. In March 2006, the authors of the current paper submitted an article to a journal, and also placed a pre-print of the paper into the pilot repository. The pre-print rapidly became the most popular download from the repository, with 625 downloads in only three weeks. The journal subsequently rejected the article on the basis that the material had already been widely disseminated and was therefore no longer topical.


\subsection{Types of content}

Decisions about the types of material that should be archived (e.g., working papers, theses, lecture material, multimedia files) are also key, as is the question of what historical material to include? There is a cost associated with scanning or converting non-digitized work into digital format. The value of the repository depends on the number of authors contributing \cite{Rank-J-2005-feasibility}. Ready targets for inclusion are outputs that would otherwise have only limited availability, such as departmental working and discussion papers, and theses and dissertations. The latter in particular are often very difficult to obtain from outside the institution that published them, yet paradoxically, they are often the easiest to obtain for the purposes of populating an IR, because there is a lower likelihood of copyright issues, and departments often have copies of the documents in question.


\subsection{Rollout to the wider Institution}

There are several additional factors to consider when rolling a pilot repository implementation out to the wider institution, as is currently being considered at the University of Otago. First, who manages it? In the case of Otago, the obvious candidates are Information Technology Services and the University Library. The Library seems the most natural candidate, however, as an IR is really more about knowledge management than technology, plus the Library already has the skills for acquiring and managing content.

Second, should the wider IR be implemented as a single monolithic entity for the entire institution, or as a federated collection of smaller repositories within institutional sub-units? Otago is currently heading in the latter direction. The two approaches are not mutually exclusive, however, as a centralized IR can easily harvest metadata from any smaller IRs in the institution, and thus provide a central entry point to all research in the institution. Another question is how to integrate IR(s) into existing information systems, in particular those used to track research outputs.

Finally, who should enter new content into the repository? Some institutions allow authors to self-archive their own work, while others centralize the process. The former approach is more flexible, but can suffer from problems with consistency and quality of metadata. In contrast, centralization is easier to manage, but may lead to a feeling of ``lack of control'' on the part of authors. This could perhaps be mitigated by introducing a compulsory deposit policy, as discussed in Section~\ref{sec-content}.


\section{Looking ahead}

An exciting consequence of the School of Business repository has been an approach from various communities throughout New Zealand to help set up repositories of historical material relating to their community. The first of these was Cardrona\footnote{\url{http://cardrona.eprints.otago.ac.nz/}}, a small Central Otago community with a long and varied history. The Cardrona Community Repository was launched on May 17 2006, and is the first community repository in New Zealand (and possibly the world). Digital repositories offer communities a wonderful opportunity to preserve their historical and cultural information, and to disseminate it to a much wider audience than would normally be possible. It also provides a sense of focus for the community, especially in cases like Cardrona, where the community is quite small and somewhat geographically dispersed.

The Otago team is also playing a major role in the Open Access Repositories in New Zealand (OARiNZ) project\footnote{\url{http://www.oarinz.ac.nz/}}. This is a new government-funded project to develop a national infrastructure that will connect all of New Zealand's digital research repositories. Work is currently underway at Otago on an easy-to-use installer and configurator for setting up EPrints repositories, in order to encourage wider adoption of these technologies.


\section{Conclusion}

The experience at Otago has shown that it is relatively easy to implement a digital institutional repository. The technology has now matured to the point where a basic repository can be set up within a couple of days by a person with a moderate level of technical knowledge. Even setting up a heavily customized repository can be achieved in a matter of days rather than weeks, if a dedicated and knowledgeable team is created and given focused, achievable and bounded goals. Software costs are essentially nil and hardware costs are minimal.

On the non-technical side, there are now sufficient repository implementations around the world that many of the issues that rear their heads (such as copyright concerns) have already been considered and dealt with. Institutions do not need to grapple alone with the thorny details, as others have already laid the groundwork and trialed many different solutions. Institutions need therefore only choose the solutions that best fit their needs and customize them to their unique situation.

[need a snappy finish...]


\bibliography{OCLC}


\end{document}