Newer
Older
Publications / OCLC.tex
nstanger on 22 Sep 2006 17 KB - Added in discussion of issues.
\documentclass[12pt,pdftex,a4paper,titlepage]{article}


\usepackage[T1]{fontenc}
\usepackage{textcomp}
\usepackage{lmodern}
\usepackage{graphicx}
\usepackage[margin=1in]{geometry}
\usepackage{pifont}
\usepackage[dcucite]{harvard}
\usepackage{url}


\title{Building institutional repositories on a shoestring}
\author{Nigel Stanger}
\date{University of Otago, Dunedin, New Zealand}


\begin{document}


\maketitle


\bibliographystyle{dcu}

\begin{abstract}
\textbf{Purpose} --- To demonstrate the ease of implementing digital institutional repositories.

\noindent\textbf{Design/methodology/approach} --- Three separate repositories were implemented at the University of Otago over a six month period, each with a different focus and different needs. The first repository was implemented as a pilot, and is now a fully-functional system. The other two repositories were implemented from the ground up as fully-functional systems.

\noindent\textbf{Findings} --- Implementing an effective digital repository requires suprisingly few resources. The first Otago repository was implemented by a team of four people in only ten days, running on cheap commodity hardware and free open source software.

\noindent\textbf{Practical implications} --- xxx

\noindent\textbf{Originality/value} --- xxx
\end{abstract}


\section{Introduction}

Digital institutional repositories have become a hot topic over the last few years, and many institutions around the world are now actively working towards implementing them. In this article we discuss how low cost yet fully functional digital institutional repositories (IRs) can be set up in a very short time frame. We reflect on the lessons learned while implementing three different repositories at the University of Otago and suggest some best practices for implementing an IR. We also discuss the issues that must be considered when moving from a small-scale pilot implementation to a full roll-out.

Interest in institutional repositories at the University of Otago was sparked by the release of the New Zealand Digital Strategy by the New Zealand government in May 2005. The strategy aims to ensure that ``New Zealand is a world leader in using information and technology to realise our economic, environmental, social and cultural goals'' \cite{NZG-2005-digital_strategy}. In parallel with this, the National Library of New Zealand set up an expert working party with representatives from across the research sector to investigate the feasibility of establishing a national institutional repository for New Zealand's research outputs \cite{Rank-J-2005-feasibility}. The National Library is fostering the creation and launch of a work programme to improve access to New Zealand's research outputs, by collaborating with institutions to stimulate the set-up of research repositories.

In May 2005, two senior University of Otago staff undertook a study tour of Digital Challenges facing universities in the United States. Their report provided the immediate impetus for the first IR pilot in Otago's School of Business. Project work began on 7 November 2005, with the following goals:
 
\begin{itemize}

	\item To establish a proof of concept demonstrator for storing and making available as ``open access'', digital research publications in the School of Business.

	\item To evaluate the potential of the demonstrator for adoption by the wider Otago University research community.

	\item To connect the School of Business with the global research community, in line with the feasibility study and recommended actions for a national repositories framework for New Zealand's research outputs \cite{Rank-J-2005-feasibility}.

\end{itemize}


\section{Implementation of the first pilot}

The GNU EPrints repository management software was chosen for the pilot repository because it was widely used, well-supported, inexpensive and would not lock the School into specific technologies or vendors \cite{Sale-A-2005-NZIRW}. The development team also had prior experience with the software. A rapid prototyping methodology was adopted, emphasising quick releases of visible results with multiple iterations, in order to create interest in the project at an early stage, and enable a positive feedback cycle. A sandbox was used to test entries and entry formats before the material went live.

The pilot IR was fully implemented within ten days of assembling the project team, with most of this time spent tweaking the look and feel of the web site and collecting content. This outcome was possible because of the establishment of a very clear brief to ``prove the concept'', rather than taking on a large scale project that would involve many different disciplines, researchers and research outputs from the outset. Early decisions were made to restrict the content and content domain used for the pilot, in order to speed the collection process and minimise the possibility of requirement ``creep''. Meetings were kept to a minimum and policy and procedural issues that required institutional decisions were noted as work progressed, rather than tackled head on. The project was widely publicised within the School and Heads of Departments were consulted to ensure top-level buy-in. This approach produced immediate results and the repository was quickly populated with a range of working/discussion papers, conference items, journal articles and theses.

There was no cost associated with the open access software community associated with GNU EPrints, and from a technical point of view the project was wonderfully straightforward. The School of Business IR\footnote{\url{http://eprints.otago.ac.nz/}} was deployed on a spare mid-range server running FreeBSD, which meant that hardware and software costs were essentially nil. In other words, if you happen to have some spare hardware lying around, you can set up an initial repository very cheaply, and then expand it later.

A minimalist approach was taken with regard to gathering potential content; partly because of the prototypical nature of the project, and partly because material in the hand is worth more than a million promises of what authors suggest they ``can'' provide if given sufficient time. New publications are always being created, and content acquisition is a moving target that has to be effectively managed. Once some basic content acquisition and data entry protocols were put in place, an incremental methodology was adopted. Content was strictly limited to voluntary contributions in PDF format from colleagues in the School of Business, but with no constraint on the type of output. As of mid-September 2006, the IR contains 327 documents covering a wide range of topics and document types, and these are added to as new content is acquired. A more systematic approach to content collection is currently being considered.

It is remarkable what can be achieved by a small, dedicated, knowledgeable and enthusiastic implementation team. As with any project, the right mix of technical and project management skills is crucial in making things happen. The project team comprised the School's Research Development Coordinator (project management and evangelism), an Information Science lecturer (software implementation), the School's IT manager (hardware and deployment) and two senior students (research, content acquisition and data entry). Oversight was provided by a standing committee comprising representatives from Information Technology Services, the University Library and the School of Business.


\section{Impact of the pilot}

The initial response to the pilot repository seemed spectacular, with over 18,000 downloads recorded within the first three months, from a very wide range of different countries. This was considerably more than several similar repositories elsewhere in the world and excited considerable interest from both inside and outside the University. It was therefore somewhat disheartening to discover in April 2006 that the hit rates were in fact over-inflated by a factor of about five. This was due to an undocumented assumption in the Tasmania statistics generating software \cite{Sale-A-2006-stats} that resulted in hits being counted more than once if the statistics were gathered more often than once per day. (The lesson here is to always be wary of computers bearing wonderful news!)

Despite this, hit rates for the Otago IR are still healthy, as shown in Figure~\ref{fig-otago-growth}. Interestingly, the repository attracts many more abstract views than downloads. An informal analysis of hit rates for five other repositories around the world shows that the same is true for some of them, while for others the opposite occurs (many more downloads than abstract views). Further investigation is needed to determine why this variation occurs.


\begin{figure}
	\centering
	\includegraphics[scale=0.79]{otago_growth}
	\caption{Total monthly hit rates (bar chart, left axis) and number of items (line chart, right axis) for the Otago repository.}
	\label{fig-otago-growth}
\end{figure}


Of particular interest is that hit rates on the Otago IR have grown at a much more rapid rate than the other repositories were examined, especially when time since launch is taken into account. As can be seen in Figure~\ref{fig-growth-comparison}, total traffic to the Otago IR has grown much more rapidly during its early months than any of the other repositories, including some that have existed for much longer. Clearly this is a very interesting development, and a research project is currently underway to investigate the possible reasons for this.


\begin{figure}
	\centering
	\includegraphics[scale=0.8]{growth_comparison}
	\caption{Comparison of traffic growth across eight EPrints repositories. (The different line styles are used to only to distinguish the lines; they have no other significance.)}
	\label{fig-growth-comparison}
\end{figure}


One exciting outcome of the IR deployment has been the ability to make publicly available material that might otherwise be inaccessible. For example, Figure~\ref{fig-item-types} shows that about two thirds of the items in the Otago IR are items that might not otherwise be readily accessible, such as theses, dissertations, and various working and discussion papers. Indeed, the top ten downloaded items as of mid-September comprise five departmental working papers, two conference papers, one journal paper, one thesis and one research report, in essentially that order.


\begin{figure}
	\centering
	\includegraphics[scale=0.8]{otago_items}
	\caption{Distribution of item types in the Otago repository.}
	\label{fig-item-types}
\end{figure}


The pilot implementation was clearly a success. Indeed, it was so successful that the ``pilot'' status was dropped in mid-May 2006 (only six months after going live), and it is now the official IR for Otago's School of Business. The lessons learned from this experience have since been applied to two other IRs at Otago, with potentially more to come.


\section{The second IR: EPrints Te Tumu}

The success of the pilot resulted in considerable interest throughout the University community. In early 2006, Te Tumu, the University's School of Maori, Pacific \& Indigenous Studies, expressed an interest in implementing an IR for their specific needs. They were particularly interested in the use of an IR as a means of disseminating research and other work, as there are relatively few ``official'' outlets for their discipline.

Drawing on experiences from the pilot, the Te Tumu IR\footnote{\url{http://eprintstetumu.otago.ac.nz/}} was implemented by a single person in about a month, and was officially launched on May 3 2006, making it the first IR for indigenous studies in New Zealand (and possibly the world). Response to the repository has been positive, with over 2,500 downloads from 46 different countries during its first four and half months (the repository currently contains about 30 items).


\section{Issues that arose}

Several issues were encountered during the implementation of the two repositories already discussed.


\begin{description}

	\item[Copyright:] This is a potentially thorny issue for any IR, although many of the concerns raised often turn out to be perceived rather than actual problems (EPrints, 2005). In our case much of the material loaded into the repository comprised departmental working or discussion papers, for which permission to publish online had already been granted. Items with uncertain copyright status had full text access restricted until their status was confirmed. A valuable resource for ascertaining journal copyright agreements is the SHERPA website (2006).

	\item[Data standards:] New Zealand's Digital Strategy proposes the long term goal of linking all New Zealand repositories to share information and avoid isolated ``silos of knowledge'', where each institution has little idea of what is happening elsewhere (New Zealand Government, 2005). It is therefore imperative that open standards such as the Dublin Core Metadata Initiative (2006) be applied for both data and metadata. The EPrints software makes this relatively trivial by natively supporting Dublin Core metadata export as specified by the Open Archives Initiative (2006). The University of Otago Library is upgrading to a new catalogue system that also supports Dublin Core, which means that it is possible to directly integrate the repository metadata into the library catalogue.

	\item[Data entry:] Data entry is likely to be carried out by people who are not specifically trained for the task (for example, document authors), so it is essential to have well-defined and widely publicised processes and standards for data entry. The EPrints software is very helpful in this area, allowing the data entry process to be heavily customised to the needs of an individual repository. In addition, a final verification or editorial step is essential to check the quality of the data entered and to ensure that the item is suitable for inclusion in the repository.

	\item[Content acquisition:] The key issue regarding acquisition of material is whether self-archiving should be compulsory (top-down) or voluntary (bottom-up). Sale (2005b) argues that a compulsory policy is much more effective at increasing the size of a repository, and illustrates this by comparing the growth rates of repositories at the Queensland University of Technology (compulsory, high growth) and the University of Queensland (voluntary, low growth). Compulsory archiving policies are often driven by the need to capture information for research evaluation and funding purposes, but run the risk that authors may react negatively to such a requirement. Swan and Brown (2004) surveyed 157 authors who did not self-archive and found that 69\% of them would willingly deposit their articles in an open repository if required to do so.

	\item[Types of content:] Decisions about the types of material that should be archived (e.g. working papers, theses, lecture material, sound and picture files) are also key, as is the question of what historical material should be included? There is a cost issue relating to non-digitised work, since scanning or conversion to PDF format is necessary. The value of the repository depends on the number of authors contributing (Rankin, 2005). Ready targets for inclusion are outputs that would otherwise have only limited availability, such as departmental working and discussion papers, and theses and dissertations. The latter in particular are often very difficult to obtain from outside the institution that published them. Paradoxically, however, they are often the easiest to obtain for the purposes of populating an IR, because there is a lower likelihood of copyright issues, and departments often have copies of the documents in question.

\end{description}


[all of these to be expanded]

\begin{itemize}

	\item Copyright: who owns it, perception vs.\ reality.

	\item Data standards: metadata, interoperability with Library systems, etc.

	\item Data entry: trained vs.\ non-trained users, editorial control.

	\item Content acquisition: voluntary vs.\ mandatory.

	\item Types of content: quality vs.\ quantity, currency vs.\ archival, media types, non-digital material, etc.

\end{itemize}


\section{Rollout to the wider University}

The University is currently in the process of determining whether to proceed with a full implementation of an IR for the University. [decision due any time now] If such a rollout occurs, the repository will become the responsibility of the University Library, which is a logical place for such a resource to be managed. The Library has expressed strong support for going ahead with a wider rollout. Regardless of whether this occurs, the School of Business has committed to continuing with the existing IR.

Further issues that need to be considered in this context:


\begin{itemize}

	\item Management: library vs.\ IT, oversight, position within the University.

	\item Integration: single monolithic repository vs.\ many small distributed repositories (we are already heading in the direction of the second model); integration with existing information systems.

	\item Data entry: authors (self-archiving) vs.\ library staff.

	\item others...

\end{itemize}


\section{Looking ahead: Community repositories}

An exciting consequence of our work on the School of Business pilot has been an approach from various communities throughout New Zealand to set up repositories of historical material relating to their community. The first of these was Cardrona\footnote{\url{http://cardrona.eprints.otago.ac.nz/}}, a small Central Otago community with a long and varied history. The Cardrona Community Repository was launched on May 17 2006, which is the first community repository in New Zealand [possibly the world?]. Digital repositories offer communities a wonderful opportunity to preserve their historical and cultural information, and to disseminate it to a much wider audience than would normally be possible.


\section{Conclusions etc.}

[to come later]


\bibliography{OCLC}


\end{document}