Newer
Older
Publications / OCLC.tex
nstanger on 25 Sep 2006 19 KB - Added to abstract.
\documentclass[12pt,pdftex,a4paper,titlepage]{article}


\usepackage[T1]{fontenc}
\usepackage{textcomp}
\usepackage{lmodern}
\usepackage{graphicx}
\usepackage[margin=1in]{geometry}
\usepackage{pifont}
\usepackage[dcucite]{harvard}
\usepackage{url}


\title{Building institutional repositories on a shoestring}
\author{Nigel Stanger \and Graham McGregor}
\date{University of Otago, Dunedin, New Zealand}


\begin{document}


\maketitle


\bibliographystyle{dcu}

\begin{abstract}
\textbf{Purpose} --- To share the authors' experiences in implementing three different digital repositories, and to demonstrate the relative ease with which a digital institutional repository can be implemented.

\noindent\textbf{Design/methodology/approach} --- Three separate repositories were implemented at the University of Otago over a six month period, each with a different focus and different needs. The first repository was implemented as a pilot, and is now a fully-functional system. The other two repositories were implemented from the ground up as fully-functional systems.

\noindent\textbf{Findings} --- Implementing an effective digital repository requires suprisingly few resources. The first Otago repository was implemented by a team of four people in only ten days, running on cheap commodity hardware and free open source software.

\noindent\textbf{Practical implications} --- The repositories described in this paper were implemented extremely quickly and were very successful. This implies that digital repository software tools have matured to a sufficient state that almost anyone can quickly and easily implement a full-featured repository.

\noindent\textbf{Originality/value} --- The three repositories discussed in this paper provide excellent exemplars in three quite different contexts: a ``traditional'' academia, indigenous studies and community history.
\end{abstract}


\section{Introduction}

Digital institutional repositories have become a hot topic over the last few years, and many institutions around the world are now actively working towards implementing them. In this article we discuss how low cost yet fully functional digital institutional repositories (IRs) can be set up in a very short time frame. We reflect on the lessons learned while implementing three different repositories at the University of Otago, and discuss some new and exciting applications of digital repositories arising from these. We also suggest some best practices for implementing an IR and discuss issues that must be considered when moving from a small-scale pilot implementation to a full roll-out.

Interest in institutional repositories at the University of Otago was sparked by the release of the New Zealand Digital Strategy by the New Zealand government in May 2005. The strategy aims to ensure that ``New Zealand is a world leader in using information and technology to realise our economic, environmental, social and cultural goals'' \cite{NZG-2005-digital_strategy}. In parallel with this, the National Library of New Zealand set up an expert working party with representatives from across the research sector to investigate the feasibility of establishing a national institutional repository for New Zealand's research outputs \cite{Rank-J-2005-feasibility}. The National Library is fostering a work programme to improve access to New Zealand's research outputs, by collaborating with institutions to stimulate the set-up of research repositories.

In May 2005, two senior University of Otago staff undertook a study tour of Digital Challenges facing universities in the United States. Their report provided the impetus for the first IR pilot in Otago's School of Business. Project work began on 7 November 2005, with the following goals \cite{Stan-N-2006-running}:
 
\begin{itemize}

	\item To establish a proof of concept demonstrator for storing and providing open access to digital research publications in the School of Business.

	\item To evaluate the potential of the demonstrator for adoption by the wider Otago University research community.

	\item To connect the School of Business with the global research community, in line with the feasibility study and recommended actions for a national repositories framework for New Zealand's research outputs \cite{Rank-J-2005-feasibility}.

\end{itemize}


\section{The pilot repository}

The GNU EPrints repository management software was chosen for the pilot repository because it was widely used, well-supported, inexpensive and would not lock the School into specific technologies or vendors \cite{Sale-A-2005-NZIRW}. The development team also had prior experience with the software. A rapid prototyping methodology was adopted, emphasising quick releases of visible results with multiple iterations, in order to create interest in the project at an early stage, and enable a positive feedback cycle. A sandbox was used to test entries and entry formats before the material went live.

The pilot repository was fully implemented within ten days of assembling the project team, with most of this time spent tweaking the look and feel of the web site and collecting content \cite{Stan-N-2006-running}. This outcome was possible because of the establishment of a very clear brief to ``prove the concept'', rather than taking on a large scale project that would involve many different disciplines, researchers and research outputs from the outset. Early decisions were made to restrict the content and content domain used for the pilot, in order to speed the collection process and minimise requirements ``creep''. Meetings were kept to a minimum and policy and procedural issues that required institutional decisions were noted as work progressed, rather than tackled head on. The project was widely publicised within the School and Heads of Departments were consulted to ensure top-level buy-in. This approach produced immediate results and the repository was quickly populated with a range of working/discussion papers, conference items, journal articles and theses.

There was no cost associated with the GNU EPrints software community, and from a technical point of view the project was wonderfully straightforward. The School of Business repository\footnote{\url{http://eprints.otago.ac.nz/}} was deployed on a spare mid-range server running FreeBSD, which meant that hardware and software costs were essentially nil. In other words, if you happen to have some spare hardware lying around, you can set up an initial repository very cheaply, and expand it later.

A minimalist approach was taken with regard to gathering content; partly because of the prototypical nature of the project, and partly because material in the hand is worth more than promises by authors to supply content at some indeterminate future date. New publications are always being created, and content acquisition is a moving target that has to be effectively managed. Once some basic content acquisition and data entry protocols were put in place, an incremental methodology was adopted. Content was limited to voluntary contributions in PDF format from colleagues in the School of Business, but with no constraint on the type of output. At the end of September 2006, the repository contains 327 documents covering a wide range of topics and document types, with new content being continually acquired.

It is remarkable what can be achieved by a small, dedicated, knowledgeable and enthusiastic implementation team. As with any project, the right mix of technical and project management skills is crucial in making things happen. The project team comprised the School's Research Development Coordinator (project management and evangelism), an Information Science lecturer (software implementation), the School's IT manager (hardware and deployment) and two senior students (research, content acquisition and data entry). Oversight was provided by a standing committee comprising representatives from Information Technology Services, the University Library and the School of Business.


\section{Impact of the pilot}

The initial response to the pilot repository seemed spectacular, with over 18,000 downloads recorded within the first three months, from a very wide range of different countries. This was considerably more than several similar repositories elsewhere in the world and excited considerable interest from both inside and outside the University. It was therefore somewhat disheartening to discover in April 2006 that the hit rates were in fact over-inflated by a factor of about five. This was due to an undocumented assumption in the Tasmania statistics generating software \cite{Sale-A-2006-stats} that resulted in hits being counted more than once if the statistics were gathered more often than once per day. (The lesson here is to always be wary of computers bearing wonderful news!)

Despite this, hit rates for the Otago repository are still healthy, as shown in Figure~\ref{fig-otago-growth}. Interestingly, the repository attracts many more abstract views than downloads. An informal analysis of hit rates across several other repositories around the world for which statistics were available, shows that some have more abstract views than downloads, while others have more downloads than abstract views. Further investigation is needed to determine why this variation occurs.


\begin{figure}
	\centering
	\includegraphics[scale=0.79]{otago_growth}
	\caption{Total monthly hit rates (bar chart, left axis) and number of items (line chart, right axis) for the Otago repository.}
	\label{fig-otago-growth}
\end{figure}


Of particular interest is that hit rates on the Otago repository have grown at a much more rapid rate than the other repositories that were examined, especially when time since launch is considered. As can be seen in Figure~\ref{fig-growth-comparison}, total traffic to the Otago repository has grown much more rapidly during its early months than for any of the other seven repositories shown, including some that are much older and larger (see Table~\ref{tab-reopsitories}). Clearly this is a very interesting development, and a research project is currently underway to investigate possible reasons for this.


\begin{figure}
	\centering
	\includegraphics[scale=0.8]{growth_comparison}
	\caption{Comparison of traffic growth across eight EPrints repositories. (The different line styles are used to only to distinguish the lines; they have no other significance.)}
	\label{fig-growth-comparison}
\end{figure}


\begin{table}
	\caption{Details of repositories compared in Figure~\ref{fig-growth-comparison}.}
	\label{tab-reopsitories}
	\begin{center}
		\begin{tabular}{lrrl}
													&	\textbf{Age in}	&	\textbf{Num.}	\\
			\textbf{Repository}						&	\textbf{months}	&	\textbf{items}	\\
			\hline
			dLIST (U.\ Arizona)						&	17				&	735	\\
			U.\ Melbourne							&	50.5			&	1279	\\
			U.\ Nottingham							&	38.5			&	235	\\
			U.\ Otago/Cardrona						&	4.5				&	13	\\
			\textbf{U.\ Otago/School of Business}	&	\textbf{10.5}	&	\textbf{327}	\\
			U.\ Otago/Te Tumu						&	5				&	30	\\
			Rhodes U. (S.\ Africa)					&	18.5			&	248	\\
			U.\ Tasmania							&	24				&	301	\\
		\end{tabular}
	\end{center}
\end{table}


One exciting outcome of the implementation has been the ability to make publicly available material that might otherwise be inaccessible. For example, Figure~\ref{fig-item-types} shows that about two thirds of the items in the Otago repository are items that might not otherwise be readily accessible, such as theses, dissertations, and various working and discussion papers. Indeed, the top ten downloaded items as of September 2006 comprise five departmental working papers, two conference papers, one journal paper, one thesis and one research report, in essentially that order.


\begin{figure}
	\centering
	\includegraphics[scale=0.8]{otago_items}
	\caption{Distribution of item types in the Otago repository.}
	\label{fig-item-types}
\end{figure}


The pilot implementation was clearly a success. Indeed, it was so successful that the ``pilot'' status was dropped in mid-May 2006 (only six months after going live), and it is now the official repository for Otago's School of Business. The lessons learned from this experience have since been applied to two other repositories at Otago.


\section{The second repository: EPrints Te Tumu}

The success of the pilot excited considerable interest throughout the University community. In early 2006, Te Tumu, Otago's School of Maori, Pacific \& Indigenous Studies, expressed an interest in implementing a repository for their specific needs. They were particularly interested in the use of a digital repository as a means of disseminating their research and other work, as there are relatively few ``official'' outlets for their discipline. In addition to the usual articles and papers that are found in most typical IRs, Te Tumu also wished to store multimedia items such as images of traditional crafts and artwork, and video clips of songs and dances. This was simply a matter of adding  appropriate item types to the EPrints metadata configuration and creating corresponding templates.

Drawing on experience from the pilot, the Te Tumu repository\footnote{\url{http://eprintstetumu.otago.ac.nz/}} was implemented by a single person in about a month, and was officially launched on May 3 2006, making it the first repository for indigenous studies in New Zealand (and possibly the world). Response to the repository has been very positive, with over 2,600 downloads from 49 different countries during its first five months. The repository currently contains 30 items, including articles, theses, images and video clips.


\section{Issues that arose}

Many issues were encountered during the implementation of both the pilot and Te Tumu repositories. The following is a summary of the key issues that had to be dealt with:


\begin{description}

	\item[Copyright:] This is a potentially thorny issue for any IR, although many of the concerns raised often turn out to be perceived rather than actual problems (EPrints, 2005). Much of the material loaded into the Otago repositories comprised departmental working or discussion papers, for which permission to publish online had already been granted. Items with uncertain copyright status had full text access restricted until their status was confirmed. The SHERPA website\footnote{\url{http://www.sherpa.ac.uk/}} was a valuable resource for ascertaining journal copyright agreements.

	\item[Data standards:] New Zealand's Digital Strategy proposes the long term goal of linking all New Zealand repositories to share information and avoid isolated ``silos of knowledge'', where each institution has little idea of what is happening elsewhere \cite{NZG-2005-digital_strategy}. It is therefore imperative that open standards such as the Dublin Core Metadata Initiative (2006) be applied for both data and metadata. Dublin Core is natively supported by EPrints, and also by many library cataloguing systems.

	\item[Data entry:] Data entry may often be carried out by people who are not specifically trained for the task (for example, document authors), so it is essential to have well-defined and widely publicised processes and standards for data entry. EPrints allows the data entry process to be heavily customised to the needs of an individual repository. In addition, a final editorial verification is essential to check the quality of the data entered and to ensure that the item is suitable for inclusion in the repository.

	\item[Content acquisition:] The key issue regarding acquisition of material is whether self-archiving should be compulsory (top-down) or voluntary (bottom-up). Sale (2005b) argues that a compulsory policy is much more effective at increasing the size of a repository, as illustrated by the growth rates of repositories at the Queensland University of Technology (compulsory, high growth) and the University of Queensland (voluntary, low growth). Compulsory archiving policies are often driven by the need to capture information for research evaluation and funding purposes, but run the risk that authors may react negatively to such a requirement. \citeasnoun{Swan-A-2004-OA} surveyed 157 authors who did not self-archive and found that 69\% of them would willingly deposit their articles in an open repository if required to do so.

	\item[Types of content:] Decisions about the types of material that should be archived (e.g. working papers, theses, lecture material, sound and picture files) are also key, as is the question of what historical material to include? There is a cost associated with scanning or converting non-digitised work into digital format. The value of the repository depends on the number of authors contributing \cite{Rank-J-2005-feasibility}. Ready targets for inclusion are outputs that would otherwise have only limited availability, such as departmental working and discussion papers, and theses and dissertations. The latter in particular are often very difficult to obtain from outside the institution that published them, yet paradoxically, they are often the easiest to obtain for the purposes of populating an IR, because there is a lower likelihood of copyright issues, and departments often have copies of the documents in question.

\end{description}


%[all of these to be expanded]
%
%\begin{itemize}
%
%	\item Copyright: who owns it, perception vs.\ reality.
%
%	\item Data standards: metadata, interoperability with Library systems, etc.
%
%	\item Data entry: trained vs.\ non-trained users, editorial control.
%
%	\item Content acquisition: voluntary vs.\ mandatory.
%
%	\item Types of content: quality vs.\ quantity, currency vs.\ archival, media types, non-digital material, etc.
%
%\end{itemize}


\section{Rollout to the wider Institution}

The University of Otago is currently considering whether to proceed with a full implementation of an IR for the University as a whole. If such a rollout occurs, the repository will be the responsibility of the University Library, which is a logical place for such a resource to be managed. The Library has expressed strong support for going ahead with a wider rollout. Regardless of whether this occurs, the School of Business has committed to continuing with its existing IR.

Further issues that need to be considered in this context:


\begin{itemize}

	\item Management: library vs.\ IT, oversight, position within the University.

	\item Integration: single monolithic repository vs.\ many small distributed repositories (we are already heading in the direction of the second model); integration with existing information systems.

	\item Data entry: authors (self-archiving) vs.\ library staff.

	\item others...

\end{itemize}


\section{Looking ahead: Community repositories}

An exciting consequence of our work on the School of Business repository has been an approach from various communities throughout New Zealand to help set up repositories of historical material relating to their community. The first of these was Cardrona\footnote{\url{http://cardrona.eprints.otago.ac.nz/}}, a small Central Otago community with a long and varied history. The Cardrona Community Repository was launched on May 17 2006, and is the first community repository in New Zealand (possibly the world). Digital repositories offer communities a wonderful opportunity to preserve their historical and cultural information, and to disseminate it to a much wider audience than would normally be possible. It also provides a sense of focus for the community, epecially in cases like Cardrona, where the community is quite small and somwhat geographically dispersed.



\section{Conclusions etc.}

[to come later]


\bibliography{OCLC}


\end{document}