Newer
Older
Digital_Repository / OARiNZ / DIY / DIY_spec.tex
nstanger on 13 Sep 2006 19 KB - Added makefile.
\documentclass[12pt,pdftex,a4paper,titlepage]{article}


\usepackage[T1]{fontenc}
\usepackage{textcomp}
\usepackage{lmodern}
\usepackage{mathpazo}
\usepackage{graphicx}
\usepackage[margin=1in]{geometry}


\graphicspath{{images/}}


\renewcommand{\ttdefault}{blg}


\title{\textsf{\textbf{OARiNZ DIY Repository Solution}}}
\author{\textsf{\textbf{Nigel Stanger}}}
\date{\textsf{\textbf{August/September 2006}}%
	\linebreak\linebreak\linebreak\linebreak\linebreak%
	\includegraphics[scale=0.4]{OU-Logo-Colour}}


\begin{document}


\maketitle


\tableofcontents


\newpage


\section{Introduction}

Implementing a digital repository, using a typical open source solution
such as GNU EPrints or DSpace, is currently a complex proposition that
requires a reasonable level of technical expertise in order to find,
download and install all the required software, then separately
configure these components appropriately for the target operating
system. This process can be simplified, in particular removing the need
to manually find, download, install and configure multiple separate
components. Instead, a single installer could manage the entire process
from start to finish.

Objective 7 of the OARiNZ project aims to address this need. This
objective aims to produce a freely distributable, easy to install CD-ROM
containing either pre-configured or self-configuring open source
software for use by institutions looking for entry-level assistance with
developing their own shareable digital repository. This document
outlines a specification for such a solution.

The nature of currently available repository software means that it is
unlikely that we can eliminate the need for some technical expertise.
Several installation and configuration tasks will require administrator
level access, for example, so the solution cannot be fully automated.
Regardless, the solution will enable repository implementors to quickly
install and configure a complete digital repository, either from ``bare
metal'' on a new server or on an existing system. In addition, the level
of required technical expertise and the complexity of the installation
and configuration process will be reduced, thus lowering the bar for
implementing a digital repository.

In the spirit of ``lowering the bar'', a key aim is to automate or
abstract as much of the repository installation and configuration
process as possible. In other words, we will not force repository
implementors to type in arcane commands unless it is absolutely
unavoidable, nor will we force them to read many pages of dense and
obscure documentation before they start. A laudable (but perhaps overly
optimistic) goal would be to make the installation process as simple as
installing software under Mac OS X or Windows.

We therefore propose the following two key deliverables:
\begin{enumerate}

	\item A ``bare metal'' installer for creating completely new
	repositories on new hardware, that includes both an operating system
	and all the required repository software.
	
	\item A standalone tool for installing and configuring an EPrints
	repository on an existing server.

\end{enumerate}
Both of these deliverables would be distributed in the form of a CD-ROM
(or equivalent medium) containing all the required software and a
``shell'' for managing the installation and configuration process.
Downloadable disk images would also be made available.

The remainder of this document discusses various design and
implementation options, typical usage scenarios, and the implementation
plan.


\section{Design and implementation options}


\subsection{Repository software}

Ultimately we would like to provide a solution for both GNU EPrints and
DSpace, which are the two major open source solutions for smaller-scale
repository implementation. However, we currently have little expertise
with DSpace, so we will initially focus on delivering a solution for
EPrints. We also plan to include the Tasmania EPrints statistics
software as a standard component, so that any repositories installed
using this solution generate download statistics out of the box.


\subsection{Operating systems}

EPrints repositories are typically run on Unix-based systems (e.g.,
Linux, BSD, Mac OS X), and we have experience at Otago with installing
EPrints on Debian Linux, FreeBSD, Mac OS X and Ubuntu Linux. Unix-based
systems will therefore be our primary target for implementation. Note
that the EPrints web site currently states that there are ``no plans for
a version to run under Microsoft Windows''.

For bare metal installations, a complete operating system distribution
will also be required. We clearly cannot provide an installation disk
for every possible Unix platform, nor for proprietary operating systems
such as Mac OS X. The bare metal installer can therefore realistically
only support one operating system platform. The easiest way to achieve
this is to pick a Unix-based operating system that provides a bootable
``live CD''.

We have experience with installing EPrints repositories under Ubuntu
Linux, which provides a live CD feature, so this is an obvious choice.
The Ubuntu live CD is also easily customisable, so we could create our
own custom live CD that included not only the base operating system but
also the required packages for installing EPrints and our configurator
software. (Note that installation of the repository software would be
incorporated into the operating system base installation process, so the
standalone repository installer/configurator would not be required for
bare metal installs.)


\subsection{Package installation}

Unfortunately Unix-based environments do not provide as much uniformity
of operating environment as we would like. There is wide variation even
across different Linux distributions, with regard to package
installation and management, system environment and standard toolsets.
The process for installing a required package is completely different
under Mac OS X, Debian Linux, Red Hat Linux and FreeBSD, for example,
and there are even sometimes multiple package management mechanisms
available within the same operating system distribution.

We therefore need to consider whether the standalone repository
installer for existing systems should use the native package management
software (e.g., Red Hat's \texttt{rpm} or Debian's \texttt{dpkg}), or
independent installer software. If we take the native route, the
installer will need to detect the operating system version and then look
for appropriate package management tools, which of course makes
implementation more complex. If we do not go native, the implementation
will be simpler, but we would lose the significant advantage of having
packages managed by the operating system, which is particularly useful
for dependency management and upgrades. We would therefore prefer the
native option.


\subsection{Repository installation and configuration interface}

We need to consider what kind of interface to present to the person
performing the repository installation and configuration process.
Possible options include:
\begin{description}

	\item[Use operating system-provided installer] We could use the
	native installer program supplied by the operating system (if such
	exists), such as the Mac OS X installer application. While this
	would provide an installation experience that is consistent with the
	user's interface expectations, this would almost certainly require
	the development of separate installers for each operating system
	platform, with consequent increase in development and maintenance
	complexity. It is also unclear whether such tools would also be able
	to effectively implement the configuration step, and they may or may
	not be able to integrate with any native OS package management tools
	(this is certainly not the case for the Mac OS X installer, for
	example).
	
	\item[Cross-platform GUI-based installer and configurator] There are
	many cross-platform installer tools available that could be used to
	build an installation tool. Many of these tools are written in Java,
	which could enable the installation user interface to look
	reasonably ``native'' for each platform. Non Java-based tools may
	impose a particular look and feel which could be visually jarring on
	different platforms. As with the native installer option, it is also
	currently unclear whether any of these tools could provide a GUI for
	the configuration step, and they may or may not be able to integrate
	with the native operating system package management tools.
	
	\item[Web-based installer and configurator] A web interface could be
	used to manage the installation and configuration process. This
	would require an active web server with some sort of back-end
	scripting support, so the web option may not be completely feasible
	for the initial installation step. There is also the issue of
	gaining adminstrator level access in order to install and configure
	many of the components. This is not insurmountable, however, as
	web-based system administration tools like Webmin can do this. The
	big advantage of using a web browser is that it should work on
	almost any platform, as long as we adhere stringently to web
	standards, and it will provide a reasonably ``native'' user
	interface experience in all cases.
	
	\item[Shell-based installer with text interface] This is the lowest
	common denominator for all Unix-based systems. We can guarantee that
	almost any Unix-based system will have some variant of C-shell
	available, or at least something compatible. The interface will not
	be very ``pretty'', but will be relatively simple to implement, and
	can handle both the installation and configuration steps without any
	difficulty, including prompting for administrator-level access. If
	implemented in a modular fashion, the installer/configurator should
	be readily portable to other Unix-based operating systems.
	Furthermore, a shell-based configurator could even act as a back-end
	application layer behind a web-based front end, solving two problems
	at once.

\end{description}

We feel that the web-based option provides the best compromise between a
truly ``native'' user interface and the flexibility required to provide
a cross-platform solution that can interface with native package
management tools, especially when combined with the shell-based option.


\subsection{Distribution media}

While we have talked so far about distribution on CD-ROMs, we see no
particular reason to limit the solution to only this medium. For
example, the solution could also be made available in DVD form and as
downloadable disk images. This will provide repository implementers with
a broad choice of installation media to suit the peculiarities of their
particular installation environment.

Furthermore, it is likely that the CD-ROM version would actually
comprise more than just a single CD-ROM. In the case of a bare metal
install, you would not only need the operating system files, but also
pre-compiled versions of all the prerequisite software in a package
format appropriate for that operating system. Similarly, for an existing
system install, we would need to include duplicate copies of all of the
prerequisite software in appropriate formats for the various package
management tools. This could easily run to at least two CD-ROMs, but
would definitely all fit onto a single DVD.

We also suggest that there should be separate disks for the bare metal
install and the existing system install options, for the following
reasons:
\begin{itemize}

	\item People with existing systems would not want to download an
	unnecessary operating system distribution in order to get the
	just repository software.

	\item The bare metal installer would only need the base operating
	system installer and the repository configurator, as the repository
	software installation will be incorporated into the base operating
	system installation process.

	\item Keeping the two separate simplifies the installation
	instructions. If the disks were combined the instructions might read
	something like this: ``If you want to install a complete operating
	system and repository from scratch, boot from this CD and follow the
	instructions. If you want to install the repository on an existing
	system, insert the CD and run XXX.'' This is long-winded and
	potentially confusing.
	
	With separate disks, the instructions would read more like this:
	``To install the operating system and repository software, boot from
	this CD and follow the instructions'' (bare metal install disk), and
	``To install the repository software, insert the CD and run XXX''
	(existing system install disk).

	\item A combined installer would probably not fit on one CD-ROM,
	whereas a separate CD-ROM for each installer might be feasible.
	
\end{itemize}


\subsection{Items to be configured}

The basic repository configuration includes things like its internal
identifier, domain name, HTTP port number and so on. All of these items
are required as part of the base configuration and will need to be
included in the configurator. Configuration of the Tasmania EPrints
statistics software would also be included here.

In addition to these compulsory items, there are also numerous optional
aspects of EPrints itself that can be configured, such as enabling the
editorial buffer, required document formats, etc. These will be included
as optional items within the configuration process, accessed via an
``advanced configuration'' page. These advanced configuration items
should be easily extensible, probably via some form of XML specification
or schema, so as to cater for future developments. (This mechanism could
also be used to specify compulsory configuration items.)

One optional configuration item of particular relevance to the OARiNZ
project is configuration of the EPrints OAI-PMH interface. While we
recommend that this remain an optional configuration, an unconfigured
OAI-PMH subsystem should be prominently highlighted within the
configurator interface, preferably on the main page. This gives
repository implementors the option to forgo initial configuration of
OAI-PMH, while gently encouraging them to eventually do so.

On this note, we see no reason why the configurator should be limited to
once-only use when the repository software is first installed. Rather,
it should be installed alongside the repository software and used as a
general management tool for creating and configuring repositories on
that server. The configurator should keep an internal record of the
configuration settings for each repository that it creates, which will
make it easier to re-configure repositories at any time.


\subsection{Summary of design recommendations}

\subsubsection*{Repository software}

\begin{itemize}

	\item GNU EPrints

\end{itemize}

\subsubsection*{Target operating system platform}

\begin{itemize}

	\item Unix-based operating systems in general

	\item Ubuntu Linux (server distribution) for the bare metal install
	option

\end{itemize}
	
\subsubsection*{Package installation}
	
\begin{itemize}

	\item Use native package management tools provided by the operating
	system wherever possible

\end{itemize}

\subsubsection*{Repository installation \& configuration interface}

\begin{itemize}

	\item Shell-based option (ideally usable as a back-end CGI script),
	as the ultimate fallback

	\item Web-based installation interface (if feasible)

	\item Web-based configuration interface

\end{itemize}

\subsubsection*{Distribution media}

\begin{itemize}

	\item CD-ROM

	\item DVD

	\item Downloadable disk images in standard formats

	\item One disk (or set of disks) for bare metal installs: base
	operating system + configurator

	\item One disk (or set of disks) for existing system installs:
	repository installer + configurator

\end{itemize}

\subsubsection*{Items to be configured}

\begin{itemize}

	\item All required EPrints, etc., configuration items
	
	\item OAI-PMH configuration optional but encouraged
	
	\item Other optional configuration items
	
	\item Extensible configuration item specification

\end{itemize}


\section{Typical usage scenarios}


\subsection{Bare metal installation}

\begin{center}
	\includegraphics{bare_metal}
\end{center}

\noindent In this scenario, a repository implementor wishes to bootstrap
a complete repository installation on new hardware. They boot from the
repository live CD (1), which installs the Ubuntu operating system along
with all the required packages for EPrints (2). The latter will probably
also include the repository configurator and configuration items list,
as implied by the dashed arrows at bottom right. After the base
installation completes (a reboot may be required), the operating system
(3) and repository configurators (4) are executed in sequence.


\subsection{Installation on existing system}

\noindent In this scenario, a repository implementor wishes to install a
repository on an existing server, which already has an operating system
and associated software. They insert the installation CD and launch the
installer (1).


\subsection{Reconfiguring an existing system}

\noindent In this scenario, a repository administrator wishes to
reconfigure their existing installation, for example, to create new
repository or to change the settings of an existing repository. They
launch the repository configurator that was installed on the server
during the initial installation (1).



\section{Implementation plan}

We envisage a phased implementation approach, with each phase building
on the outputs from the previous phase. However, not all of the tasks
are sequential in nature and may be able to be carried out in parallel.
Estimated start and finish dates are provided, but may be subject to
change as work progresses.


\subsection{Phase 1: Implement shell-based repository installer/configurator}

\noindent \textbf{Start:} Mid-September 2006	\\
\textbf{Finish:} 31 October 2006

\begin{itemize}

	\item Initially for Ubuntu Linux only.

	\item Modular implementation so that it is readily generalisable to
	other platforms.

	\item Infrastructure for specifying configuration items and saving
	repository configuration information.

	\item Must be able to obtain adminstrator level access.

	\item Can be run either standalone or as a CGI script.
	
	\item Test.

\end{itemize}


\subsection{Phase 2: Implement web-based installer/configurator interface}

\noindent \textbf{Start:} 1 October 2006	\\
\textbf{Finish:} Mid-November 2006

\begin{itemize}

	\item Use shell-based installer/configurator as a back end.

	\item Investigate feasibilty of a web-based UI for the installation
	step (e.g., by providing an Apache executable on the CD).

	\item Test.

\end{itemize}


\subsection{Phase 3: Build bare metal installer (live CD)}

\noindent \textbf{Start:} Mid-October 2006	\\
\textbf{Finish:} 30 November 2006

\begin{itemize}

	\item Create \texttt{.deb} packages for EPrints and other associated
	software that are not available in this format.

	\item Customise Ubuntu live CD with required packages for repository
	installation.

	\item Integrate repository configurator into Ubuntu installation
	process.

	\item Test.
	
\end{itemize}


\subsection{Phase 4: Generalise standalone installer/configurator to other platforms}

\noindent \textbf{Start:} 1 November 2006	\\
\textbf{Finish:} 31 December 2006

\begin{itemize}

	\item Debian Linux

	\item Mac OS X (investigate installation and use of Fink package
	manager).

	\item FreeBSD

	\item Others?
	
\end{itemize}
	

\vfill {\scriptsize \hfill \verb+$Id$+}


\end{document}