diff --git a/OARiNZ/DIY/DIY_spec.tex b/OARiNZ/DIY/DIY_spec.tex new file mode 100755 index 0000000..0f2aaed --- /dev/null +++ b/OARiNZ/DIY/DIY_spec.tex @@ -0,0 +1,482 @@ +\documentclass[12pt,pdftex,a4paper,titlepage]{article} + + +\usepackage[T1]{fontenc} +\usepackage{textcomp} +\usepackage{lmodern} +\usepackage{mathpazo} +\usepackage{graphicx} +\usepackage[margin=1in]{geometry} + + +\renewcommand{\ttdefault}{blg} + + +\title{\textsf{\textbf{OARiNZ DIY Repository Solution}}} +\author{\textsf{\textbf{Nigel Stanger}}} +\date{\textsf{\textbf{August/September 2006}}% + \linebreak\linebreak\linebreak\linebreak\linebreak% + \includegraphics[scale=0.4]{OU-Logo-Colour}} + + +\begin{document} + + +\maketitle + + +\tableofcontents + + +\newpage + + +\section{Introduction} + +Implementing a digital repository, using a typical open source solution +such as GNU EPrints or DSpace, is currently a complex proposition that +requires a reasonable level of technical expertise in order to find, +download and install all the required software, then separately +configure these components appropriately for the target operating +system. This process can be simplified, in particular removing the need +to manually find, download, install and configure multiple separate +components. Instead, a single installer could manage the entire process +from start to finish. + +Objective 7 of the OARiNZ project aims to address this need. This +objective aims to produce a freely distributable, easy to install CD-ROM +containing either pre-configured or self-configuring open source +software for use by institutions looking for entry-level assistance with +developing their own shareable digital repository. This document +outlines a specification for such a solution. + +The nature of currently available repository software means that it is +unlikely that we can eliminate the need for some technical expertise. +Several installation and configuration tasks will require administrator +level access, for example, so the solution cannot be fully automated. +Regardless, the solution will enable repository implementors to quickly +install and configure a complete digital repository, either from ``bare +metal'' on a new server or on an existing system. In addition, the level +of required technical expertise and the complexity of the installation +and configuration process will be reduced, thus lowering the bar for +implementing a digital repository. + +In the spirit of ``lowering the bar'', a key aim is to automate or +abstract as much of the repository installation and configuration +process as possible. In other words, we will not force repository +implementors to type in arcane commands unless it is absolutely +unavoidable, nor will we force them to read many pages of dense and +obscure documentation before they start. A laudable (but perhaps overly +optimistic) goal would be to make the installation process as simple as +installing software under Mac OS X or Windows. + +We therefore propose the following two key deliverables: +\begin{enumerate} + + \item A ``bare metal'' installer for creating completely new + repositories on new hardware, that includes both an operating system + and all the required repository software. + + \item A standalone tool for installing and configuring an EPrints + repository on an existing server. + +\end{enumerate} +Both of these deliverables would be distributed in the form of a CD-ROM +(or equivalent medium) containing all the required software and a +``shell'' for managing the installation and configuration process. +Downloadable disk images would also be made available. + +The remainder of this document discusses various design and +implementation options, the proposed architecture for the solution, and +the implementation plan. + + +\section{Design and implementation options} + + +\subsection{Repository software} + +Ultimately we would like to provide a solution for both GNU EPrints and +DSpace, which are the two major open source solutions for smaller-scale +repository implementation. However, we currently have little expertise +with DSpace, so we will initially focus on delivering a solution for +EPrints. We also plan to include the Tasmania EPrints statistics +software as a standard component, so that any repositories installed +using this solution generate download statistics out of the box. + + +\subsection{Operating systems} + +EPrints repositories are typically run on Unix-based systems (e.g., +Linux, BSD, Mac OS X), and we have experience at Otago with installing +EPrints on Debian Linux, FreeBSD, Mac OS X and Ubuntu Linux. Unix-based +systems will therefore be our primary target for implementation. Note +that the EPrints web site currently states that there are ``no plans for +a version to run under Microsoft Windows''. + +For bare metal installations, a complete operating system distribution +will also be required. We clearly cannot provide an installation disk +for every possible Unix platform, nor for proprietary operating systems +such as Mac OS X. The bare metal installer can therefore realistically +only support one operating system platform. The easiest way to achieve +this is to pick a Unix-based operating system that provides a bootable +``live CD''. + +We have experience with installing EPrints repositories under Ubuntu +Linux, which provides a live CD feature, so this is an obvious choice. +The Ubuntu live CD is also easily customisable, so we could create our +own custom live CD that included not only the base operating system but +also the required packages for installing EPrints and our configurator +software. (Note that installation of the repository software would be +incorporated into the operating system base installation process, so the +standalone repository installer/configurator would not be required for +bare metal installs.) + + +\subsection{Package installation} + +Unfortunately Unix-based environments do not provide as much uniformity +of operating environment as we would like. There is wide variation even +across different Linux distributions, with regard to package +installation and management, system environment and standard toolsets. +The process for installing a required package is completely different +under Mac OS X, Debian Linux, Red Hat Linux and FreeBSD, for example, +and there are even sometimes multiple package management mechanisms +available within the same operating system distribution. + +We therefore need to consider whether the standalone repository +installer for existing systems should use the native package management +software (e.g., Red Hat's \texttt{rpm} or Debian's \texttt{dpkg}), or +independent installer software. If we take the native route, the +installer will need to detect the operating system version and then look +for appropriate package management tools, which of course makes +implementation more complex. If we do not go native, the implementation +will be simpler, but we would lose the significant advantage of having +packages managed by the operating system, which is particularly useful +for dependency management and upgrades. We would therefore prefer the +native option. + + +\subsection{Repository installation and configuration interface} + +We need to consider what kind of interface to present to the person +performing the repository installation and configuration process. +Possible options include: +\begin{description} + + \item[Use operating system-provided installer] We could use the + native installer program supplied by the operating system (if such + exists), such as the Mac OS X installer application. While this + would provide an installation experience that is consistent with the + user's interface expectations, this would almost certainly require + the development of separate installers for each operating system + platform, with consequent increase in development and maintenance + complexity. It is also unclear whether such tools would also be able + to effectively implement the configuration step, and they may or may + not be able to integrate with any native OS package management tools + (this is certainly not the case for the Mac OS X installer, for + example). + + \item[Cross-platform GUI-based installer and configurator] There are + many cross-platform installer tools available that could be used to + build an installation tool. Many of these tools are written in Java, + which could enable the installation user interface to look + reasonably ``native'' for each platform. Non Java-based tools may + impose a particular look and feel which could be visually jarring on + different platforms. As with the native installer option, it is also + currently unclear whether any of these tools could provide a GUI for + the configuration step, and they may or may not be able to integrate + with the native operating system package management tools. + + \item[Web-based installer and configurator] A web interface could be + used to manage the installation and configuration process. This + would require an active web server with some sort of back-end + scripting support, so the web option may not be completely feasible + for the initial installation step. There is also the issue of + gaining adminstrator level access in order to install and configure + many of the components. This is not insurmountable, however, as + web-based system administration tools like Webmin can do this. The + big advantage of using a web browser is that it should work on + almost any platform, as long as we adhere stringently to web + standards, and it will provide a reasonably ``native'' user + interface experience in all cases. + + \item[Shell-based installer with text interface] This is the lowest + common denominator for all Unix-based systems. We can guarantee that + almost any Unix-based system will have some variant of C-shell + available, or at least something compatible. The interface will not + be very ``pretty'', but will be relatively simple to implement, and + can handle both the installation and configuration steps without any + difficulty, including prompting for administrator-level access. If + implemented correctly, a shell-based configurator could even act as + a back-end application layer behind a web-based front end, enabling + us to solve two problems at once. + +\end{description} + +We feel that the web-based option provides the best compromise between a +truly ``native'' user interface and the flexibility required to provide +a cross-platform solution that can interface with native package +management tools, especially when combined with the shell-based option. + + +\subsection{Distribution media} + +While we have talked so far about distribution on CD-ROMs, we see no +particular reason to limit the solution to only this medium. For +example, the solution could also be made available in DVD form and as +downloadable disk images. This will provide repository implementers with +a broad choice of installation media to suit the peculiarities of their +particular installation environment. + +Furthermore, it is likely that the CD-ROM version would actually +comprise more than just a single CD-ROM. In the case of a bare metal +install, you would not only need the operating system files, but also +pre-compiled versions of all the prerequisite software in a package +format appropriate for that operating system. Similarly, for an existing +system install, we would need to include duplicate copies of all of the +prerequisite software in appropriate formats for the various package +management tools. This could easily run to at least two CD-ROMs, but +would definitely all fit onto a single DVD. + +We also suggest that there should be separate disks for the bare metal +install and the existing system install options, for the following +reasons: +\begin{itemize} + + \item People with existing systems would not want to download an + unnecessary operating system distribution in order to get the + just repository software. + + \item The bare metal installer would only need the base operating + system installer and the repository configurator, as the repository + software installation will be incorporated into the base operating + system installation process. + + \item Keeping the two separate simplifies the installation + instructions. If the disks were combined the instructions might read + something like this: ``If you want to install a complete operating + system and repository from scratch, boot from this CD and follow the + instructions. If you want to install the repository on an existing + system, insert the CD and run XXX.'' This is long-winded and + potentially confusing. + + With separate disks, the instructions would read more like this: + ``To install the operating system and repository software, boot from + this CD and follow the instructions'' (bare metal install disk), and + ``To install the repository software, insert the CD and run XXX'' + (existing system install disk). + + \item A combined installer would probably not fit on one CD-ROM, + whereas a separate CD-ROM for each installer might be feasible. + +\end{itemize} + + +\subsection{Items to be configured} + +The basic repository configuration includes things like its internal +identifier, domain name, HTTP port number and so on. All of these items +are required as part of the base configuration and will need to be +included in the configurator. Configuration of the Tasmania EPrints +statistics software would also be included here. + +In addition to these compulsory items, there are also numerous optional +aspects of EPrints itself that can be configured, such as enabling the +editorial buffer, required document formats, etc. These will be included +as optional items within the configuration process, accessed via an +``advanced configuration'' page. These advanced configuration items +should be easily extensible, probably via some form of XML specification +or schema, so as to cater for future developments. (This mechanism could +also be used to specify compulsory configuration items.) + +One optional configuration item of particular relevance to the OARiNZ +project is configuration of the EPrints OAI-PMH interface. While we +recommend that this remain an optional configuration, an unconfigured +OAI-PMH subsystem should be prominently highlighted within the +configurator interface, preferably on the main page. This gives +repository implementors the option to forgo initial configuration of +OAI-PMH, while gently encouraging them to eventually do so. + +On this note, we see no reason why the configurator should be limited to +once-only use when the repository software is first installed. Rather, +it should be installed alongside the repository software and used as a +general management tool for creating and configuring repositories on +that server. The configurator should keep an internal record of the +configuration settings for each repository that it creates, which will +make it easier to re-configure repositories at any time. + + +\subsection{Summary of design recommendations} + +\subsubsection*{Repository software} + +\begin{itemize} + + \item GNU EPrints + +\end{itemize} + +\subsubsection*{Target operating system platform} + +\begin{itemize} + + \item Unix-based operating systems in general + + \item Ubuntu Linux (server distribution) for the bare metal install + option + +\end{itemize} + +\subsubsection*{Package installation} + +\begin{itemize} + + \item Use native package management tools provided by the operating + system wherever possible + +\end{itemize} + +\subsubsection*{Repository installation \& configuration interface} + +\begin{itemize} + + \item Shell-based option (ideally usable as a back-end CGI script), + as the ultimate fallback + + \item Web-based installation interface (if feasible) + + \item Web-based configuration interface + +\end{itemize} + +\subsubsection*{Distribution media} + +\begin{itemize} + + \item CD-ROM + + \item DVD + + \item Downloadable disk images in standard formats + + \item One disk (or set of disks) for bare metal installs: base + operating system + configurator + + \item One disk (or set of disks) for existing system installs: + repository installer + configurator + +\end{itemize} + +\subsubsection*{Items to be configured} + +\begin{itemize} + + \item All required EPrints, etc., configuration items + + \item OAI-PMH configuration optional but encouraged + + \item Other optional configuration items + + \item Extensible configuration item specification + +\end{itemize} + + +\section{Proposed architecture} + + +\section{Implementation plan} + +We envisage a phased implementation approach, with each phase building +on the outputs from the previous phase. However, not all of the tasks +are sequential in nature and may be able to be carried out in parallel. +Estimated start and finish dates are provided, but may be subject to +change as work progresses. + + +\subsection{Phase 1: Implement shell-based repository installer/configurator} + +\noindent \textbf{Start:} Mid-September 2006 \\ +\textbf{Finish:} 31 October 2006 + +\begin{itemize} + + \item Initially for Ubuntu Linux only. + + \item Modular implementation so that it is readily generalisable to + other platforms. + + \item Infrastructure for specifying configuration items and saving + repository configuration information. + + \item Must be able to obtain adminstrator level access. + + \item Can be run either standalone or as a CGI script. + + \item Test. + +\end{itemize} + + +\subsection{Phase 2: Implement web-based installer/configurator interface} + +\noindent \textbf{Start:} 1 October 2006 \\ +\textbf{Finish:} Mid-November 2006 + +\begin{itemize} + + \item Use shell-based installer/configurator as a back end. + + \item Investigate feasibilty of a web-based UI for the installation + step (e.g., by providing an Apache executable on the CD). + + \item Test. + +\end{itemize} + + +\subsection{Phase 3: Build bare metal installer (live CD)} + +\noindent \textbf{Start:} Mid-October 2006 \\ +\textbf{Finish:} 30 November 2006 + +\begin{itemize} + + \item Create \texttt{.deb} packages for EPrints and other associated + software that are not available in this format. + + \item Customise Ubuntu live CD with required packages for repository + installation. + + \item Integrate repository configurator into Ubuntu installation + process. + + \item Test. + +\end{itemize} + + +\subsection{Phase 4: Generalise standalone installer/configurator to other platforms} + +\noindent \textbf{Start:} 1 November 2006 \\ +\textbf{Finish:} 31 December 2006 + +\begin{itemize} + + \item Debian Linux + + \item Mac OS X (investigate installation and use of Fink package + manager). + + \item FreeBSD + + \item Others? + +\end{itemize} + + +\vfill {\scriptsize \hfill \verb+$Id$+} + + +\end{document}