GitBucket
4.21.2
Toggle navigation
Snippets
Sign in
Files
Branches
1
Releases
Issues
Pull requests
Labels
Priorities
Milestones
Wiki
Forks
nigel.stanger
/
Digital_Repository
Browse code
- Copied from EduForge wiki.
master
1 parent
554717a
commit
da2d65ab026045e9208c8c4a1f5d670cc96fc43f
nstanger
authored
on 12 Sep 2006
Patch
Showing
1 changed file
OARiNZ/DIY/DIY_spec.tex
Ignore Space
Show notes
View
OARiNZ/DIY/DIY_spec.tex
0 → 100755
\documentclass[12pt,pdftex,a4paper,titlepage]{article} \usepackage[T1]{fontenc} \usepackage{textcomp} \usepackage{lmodern} \usepackage{mathpazo} \usepackage{graphicx} \usepackage[margin=1in]{geometry} \renewcommand{\ttdefault}{blg} \title{\textsf{\textbf{OARiNZ DIY Repository Solution}}} \author{\textsf{\textbf{Nigel Stanger}}} \date{\textsf{\textbf{August/September 2006}}% \linebreak\linebreak\linebreak\linebreak\linebreak% \includegraphics[scale=0.4]{OU-Logo-Colour}} \begin{document} \maketitle \tableofcontents \newpage \section{Introduction} Implementing a digital repository, using a typical open source solution such as GNU EPrints or DSpace, is currently a complex proposition that requires a reasonable level of technical expertise in order to find, download and install all the required software, then separately configure these components appropriately for the target operating system. This process can be simplified, in particular removing the need to manually find, download, install and configure multiple separate components. Instead, a single installer could manage the entire process from start to finish. Objective 7 of the OARiNZ project aims to address this need. This objective aims to produce a freely distributable, easy to install CD-ROM containing either pre-configured or self-configuring open source software for use by institutions looking for entry-level assistance with developing their own shareable digital repository. This document outlines a specification for such a solution. The nature of currently available repository software means that it is unlikely that we can eliminate the need for some technical expertise. Several installation and configuration tasks will require administrator level access, for example, so the solution cannot be fully automated. Regardless, the solution will enable repository implementors to quickly install and configure a complete digital repository, either from ``bare metal'' on a new server or on an existing system. In addition, the level of required technical expertise and the complexity of the installation and configuration process will be reduced, thus lowering the bar for implementing a digital repository. In the spirit of ``lowering the bar'', a key aim is to automate or abstract as much of the repository installation and configuration process as possible. In other words, we will not force repository implementors to type in arcane commands unless it is absolutely unavoidable, nor will we force them to read many pages of dense and obscure documentation before they start. A laudable (but perhaps overly optimistic) goal would be to make the installation process as simple as installing software under Mac OS X or Windows. We therefore propose the following two key deliverables: \begin{enumerate} \item A ``bare metal'' installer for creating completely new repositories on new hardware, that includes both an operating system and all the required repository software. \item A standalone tool for installing and configuring an EPrints repository on an existing server. \end{enumerate} Both of these deliverables would be distributed in the form of a CD-ROM (or equivalent medium) containing all the required software and a ``shell'' for managing the installation and configuration process. Downloadable disk images would also be made available. The remainder of this document discusses various design and implementation options, the proposed architecture for the solution, and the implementation plan. \section{Design and implementation options} \subsection{Repository software} Ultimately we would like to provide a solution for both GNU EPrints and DSpace, which are the two major open source solutions for smaller-scale repository implementation. However, we currently have little expertise with DSpace, so we will initially focus on delivering a solution for EPrints. We also plan to include the Tasmania EPrints statistics software as a standard component, so that any repositories installed using this solution generate download statistics out of the box. \subsection{Operating systems} EPrints repositories are typically run on Unix-based systems (e.g., Linux, BSD, Mac OS X), and we have experience at Otago with installing EPrints on Debian Linux, FreeBSD, Mac OS X and Ubuntu Linux. Unix-based systems will therefore be our primary target for implementation. Note that the EPrints web site currently states that there are ``no plans for a version to run under Microsoft Windows''. For bare metal installations, a complete operating system distribution will also be required. We clearly cannot provide an installation disk for every possible Unix platform, nor for proprietary operating systems such as Mac OS X. The bare metal installer can therefore realistically only support one operating system platform. The easiest way to achieve this is to pick a Unix-based operating system that provides a bootable ``live CD''. We have experience with installing EPrints repositories under Ubuntu Linux, which provides a live CD feature, so this is an obvious choice. The Ubuntu live CD is also easily customisable, so we could create our own custom live CD that included not only the base operating system but also the required packages for installing EPrints and our configurator software. (Note that installation of the repository software would be incorporated into the operating system base installation process, so the standalone repository installer/configurator would not be required for bare metal installs.) \subsection{Package installation} Unfortunately Unix-based environments do not provide as much uniformity of operating environment as we would like. There is wide variation even across different Linux distributions, with regard to package installation and management, system environment and standard toolsets. The process for installing a required package is completely different under Mac OS X, Debian Linux, Red Hat Linux and FreeBSD, for example, and there are even sometimes multiple package management mechanisms available within the same operating system distribution. We therefore need to consider whether the standalone repository installer for existing systems should use the native package management software (e.g., Red Hat's \texttt{rpm} or Debian's \texttt{dpkg}), or independent installer software. If we take the native route, the installer will need to detect the operating system version and then look for appropriate package management tools, which of course makes implementation more complex. If we do not go native, the implementation will be simpler, but we would lose the significant advantage of having packages managed by the operating system, which is particularly useful for dependency management and upgrades. We would therefore prefer the native option. \subsection{Repository installation and configuration interface} We need to consider what kind of interface to present to the person performing the repository installation and configuration process. Possible options include: \begin{description} \item[Use operating system-provided installer] We could use the native installer program supplied by the operating system (if such exists), such as the Mac OS X installer application. While this would provide an installation experience that is consistent with the user's interface expectations, this would almost certainly require the development of separate installers for each operating system platform, with consequent increase in development and maintenance complexity. It is also unclear whether such tools would also be able to effectively implement the configuration step, and they may or may not be able to integrate with any native OS package management tools (this is certainly not the case for the Mac OS X installer, for example). \item[Cross-platform GUI-based installer and configurator] There are many cross-platform installer tools available that could be used to build an installation tool. Many of these tools are written in Java, which could enable the installation user interface to look reasonably ``native'' for each platform. Non Java-based tools may impose a particular look and feel which could be visually jarring on different platforms. As with the native installer option, it is also currently unclear whether any of these tools could provide a GUI for the configuration step, and they may or may not be able to integrate with the native operating system package management tools. \item[Web-based installer and configurator] A web interface could be used to manage the installation and configuration process. This would require an active web server with some sort of back-end scripting support, so the web option may not be completely feasible for the initial installation step. There is also the issue of gaining adminstrator level access in order to install and configure many of the components. This is not insurmountable, however, as web-based system administration tools like Webmin can do this. The big advantage of using a web browser is that it should work on almost any platform, as long as we adhere stringently to web standards, and it will provide a reasonably ``native'' user interface experience in all cases. \item[Shell-based installer with text interface] This is the lowest common denominator for all Unix-based systems. We can guarantee that almost any Unix-based system will have some variant of C-shell available, or at least something compatible. The interface will not be very ``pretty'', but will be relatively simple to implement, and can handle both the installation and configuration steps without any difficulty, including prompting for administrator-level access. If implemented correctly, a shell-based configurator could even act as a back-end application layer behind a web-based front end, enabling us to solve two problems at once. \end{description} We feel that the web-based option provides the best compromise between a truly ``native'' user interface and the flexibility required to provide a cross-platform solution that can interface with native package management tools, especially when combined with the shell-based option. \subsection{Distribution media} While we have talked so far about distribution on CD-ROMs, we see no particular reason to limit the solution to only this medium. For example, the solution could also be made available in DVD form and as downloadable disk images. This will provide repository implementers with a broad choice of installation media to suit the peculiarities of their particular installation environment. Furthermore, it is likely that the CD-ROM version would actually comprise more than just a single CD-ROM. In the case of a bare metal install, you would not only need the operating system files, but also pre-compiled versions of all the prerequisite software in a package format appropriate for that operating system. Similarly, for an existing system install, we would need to include duplicate copies of all of the prerequisite software in appropriate formats for the various package management tools. This could easily run to at least two CD-ROMs, but would definitely all fit onto a single DVD. We also suggest that there should be separate disks for the bare metal install and the existing system install options, for the following reasons: \begin{itemize} \item People with existing systems would not want to download an unnecessary operating system distribution in order to get the just repository software. \item The bare metal installer would only need the base operating system installer and the repository configurator, as the repository software installation will be incorporated into the base operating system installation process. \item Keeping the two separate simplifies the installation instructions. If the disks were combined the instructions might read something like this: ``If you want to install a complete operating system and repository from scratch, boot from this CD and follow the instructions. If you want to install the repository on an existing system, insert the CD and run XXX.'' This is long-winded and potentially confusing. With separate disks, the instructions would read more like this: ``To install the operating system and repository software, boot from this CD and follow the instructions'' (bare metal install disk), and ``To install the repository software, insert the CD and run XXX'' (existing system install disk). \item A combined installer would probably not fit on one CD-ROM, whereas a separate CD-ROM for each installer might be feasible. \end{itemize} \subsection{Items to be configured} The basic repository configuration includes things like its internal identifier, domain name, HTTP port number and so on. All of these items are required as part of the base configuration and will need to be included in the configurator. Configuration of the Tasmania EPrints statistics software would also be included here. In addition to these compulsory items, there are also numerous optional aspects of EPrints itself that can be configured, such as enabling the editorial buffer, required document formats, etc. These will be included as optional items within the configuration process, accessed via an ``advanced configuration'' page. These advanced configuration items should be easily extensible, probably via some form of XML specification or schema, so as to cater for future developments. (This mechanism could also be used to specify compulsory configuration items.) One optional configuration item of particular relevance to the OARiNZ project is configuration of the EPrints OAI-PMH interface. While we recommend that this remain an optional configuration, an unconfigured OAI-PMH subsystem should be prominently highlighted within the configurator interface, preferably on the main page. This gives repository implementors the option to forgo initial configuration of OAI-PMH, while gently encouraging them to eventually do so. On this note, we see no reason why the configurator should be limited to once-only use when the repository software is first installed. Rather, it should be installed alongside the repository software and used as a general management tool for creating and configuring repositories on that server. The configurator should keep an internal record of the configuration settings for each repository that it creates, which will make it easier to re-configure repositories at any time. \subsection{Summary of design recommendations} \subsubsection*{Repository software} \begin{itemize} \item GNU EPrints \end{itemize} \subsubsection*{Target operating system platform} \begin{itemize} \item Unix-based operating systems in general \item Ubuntu Linux (server distribution) for the bare metal install option \end{itemize} \subsubsection*{Package installation} \begin{itemize} \item Use native package management tools provided by the operating system wherever possible \end{itemize} \subsubsection*{Repository installation \& configuration interface} \begin{itemize} \item Shell-based option (ideally usable as a back-end CGI script), as the ultimate fallback \item Web-based installation interface (if feasible) \item Web-based configuration interface \end{itemize} \subsubsection*{Distribution media} \begin{itemize} \item CD-ROM \item DVD \item Downloadable disk images in standard formats \item One disk (or set of disks) for bare metal installs: base operating system + configurator \item One disk (or set of disks) for existing system installs: repository installer + configurator \end{itemize} \subsubsection*{Items to be configured} \begin{itemize} \item All required EPrints, etc., configuration items \item OAI-PMH configuration optional but encouraged \item Other optional configuration items \item Extensible configuration item specification \end{itemize} \section{Proposed architecture} \section{Implementation plan} We envisage a phased implementation approach, with each phase building on the outputs from the previous phase. However, not all of the tasks are sequential in nature and may be able to be carried out in parallel. Estimated start and finish dates are provided, but may be subject to change as work progresses. \subsection{Phase 1: Implement shell-based repository installer/configurator} \noindent \textbf{Start:} Mid-September 2006 \\ \textbf{Finish:} 31 October 2006 \begin{itemize} \item Initially for Ubuntu Linux only. \item Modular implementation so that it is readily generalisable to other platforms. \item Infrastructure for specifying configuration items and saving repository configuration information. \item Must be able to obtain adminstrator level access. \item Can be run either standalone or as a CGI script. \item Test. \end{itemize} \subsection{Phase 2: Implement web-based installer/configurator interface} \noindent \textbf{Start:} 1 October 2006 \\ \textbf{Finish:} Mid-November 2006 \begin{itemize} \item Use shell-based installer/configurator as a back end. \item Investigate feasibilty of a web-based UI for the installation step (e.g., by providing an Apache executable on the CD). \item Test. \end{itemize} \subsection{Phase 3: Build bare metal installer (live CD)} \noindent \textbf{Start:} Mid-October 2006 \\ \textbf{Finish:} 30 November 2006 \begin{itemize} \item Create \texttt{.deb} packages for EPrints and other associated software that are not available in this format. \item Customise Ubuntu live CD with required packages for repository installation. \item Integrate repository configurator into Ubuntu installation process. \item Test. \end{itemize} \subsection{Phase 4: Generalise standalone installer/configurator to other platforms} \noindent \textbf{Start:} 1 November 2006 \\ \textbf{Finish:} 31 December 2006 \begin{itemize} \item Debian Linux \item Mac OS X (investigate installation and use of Fink package manager). \item FreeBSD \item Others? \end{itemize} \vfill {\scriptsize \hfill \verb+$Id$+} \end{document}
Show line notes below