diff --git a/OARiNZ/DIY/DIY_spec.tex b/OARiNZ/DIY/DIY_spec.tex index 8312009..5a1a6aa 100755 --- a/OARiNZ/DIY/DIY_spec.tex +++ b/OARiNZ/DIY/DIY_spec.tex @@ -8,6 +8,7 @@ \usepackage{graphicx} \usepackage[margin=1in]{geometry} \usepackage{pifont} +\usepackage{url} \graphicspath{{images/}} @@ -41,11 +42,11 @@ Objective 7 of the OARiNZ project aims to address this need. This objective aims to produce a freely distributable, easy to install CD-ROM containing either pre-configured or self-configuring open source software for use by institutions looking for entry-level assistance with developing their own shareable digital repository. This document outlines a specification for such a solution. -The nature of currently available repository software means that it is unlikely that we can eliminate the need for some technical expertise. Several installation and configuration tasks require administrator level access, for example, so the solution cannot be fully automated. Regardless, the solution will enable repository implementers to quickly install and configure a complete digital repository, either from ``bare metal'' on a new server or on an existing system. In addition, the level of required technical expertise and the complexity of the installation and configuration process will be reduced, thus lowering the bar for implementing a digital repository. +The nature of currently available repository software means that it is unlikely that we can completely eliminate the need for some technical expertise. Several installation and configuration tasks require administrator level access, for example, so the solution cannot be fully automated. Regardless, the solution will enable repository implementers to quickly install and configure a complete digital repository, either from ``bare metal'' on a new server or on an existing system. In addition, the level of required technical expertise and the complexity of the installation and configuration process will be reduced, thus lowering the bar for implementing a digital repository. -In the spirit of ``lowering the bar'', a key aim is to automate or abstract as much of the repository installation and configuration process as possible, focusing attention instead on only those elements that \emph{require} human intervention. In other words, we will not force repository implementers to type in arcane commands unless it is absolutely unavoidable, nor will we force them to read many pages of dense and obscure documentation before they start or burden them with byzantine installation procedures. A laudable (but perhaps overly optimistic) goal would be to make the installation process as easy as installing software under Mac OS X or Windows. +In the spirit of ``lowering the bar'', a key aim is to automate or abstract as much of the repository installation and configuration process as possible, focusing attention instead on only those elements that \emph{require} human intervention. In other words, repository implementers will not be forced to type in arcane commands unless it is absolutely unavoidable, nor will they be forced to read many pages of dense and obscure documentation before they start or be burdened with byzantine installation procedures. A laudable (but perhaps overly optimistic) goal would be to make the installation process as easy as installing software under Mac OS X or Windows. -We therefore propose the following two key deliverables: +The following two key deliverables are therefore proposed: \begin{enumerate} \item A ``bare metal'' installer for creating completely new repositories on new hardware, that includes both an operating system and all the required repository software. @@ -63,50 +64,50 @@ \subsection{Repository software} -Ultimately we would like to provide a solution for both GNU EPrints and DSpace, which are the two major open source solutions for smaller-scale repository implementation. However, we currently have little expertise with DSpace, so we will initially focus on delivering a solution for EPrints. We also plan to include the Tasmania EPrints statistics software as a standard component, so that any repositories installed using this solution generate download statistics out of the box. +Ultimately it would be nice to provide a solution for both GNU EPrints and DSpace, which are the two major open source solutions for smaller-scale repository implementation. However, we currently have little expertise at Otago with DSpace, so the initial focus will be on delivering a solution for EPrints. The Tasmania EPrints statistics software will also be included as a standard component, so that any repositories installed using this solution generate download statistics out of the box. \subsection{Operating systems} EPrints repositories are typically run on Unix-based systems (e.g., Linux, BSD, Mac OS X), and we have experience at Otago with installing EPrints on Debian Linux, FreeBSD, Mac OS X and Ubuntu Linux. Unix-based systems will therefore be our primary target for implementation. Note that the EPrints web site currently states that there are ``no plans for a version to run under Microsoft Windows''. -For bare metal installations, a complete operating system distribution will also be required. We clearly cannot provide an installation disk for every possible Unix platform, nor for proprietary operating systems such as Mac OS X. The bare metal installer can therefore realistically only support one operating system platform. The easiest way to achieve this is to pick a Unix-based operating system that provides a bootable ``live CD''. +For bare metal installations, a complete operating system distribution will also be required. It is clearly not possible to provide an installation disk for every possible Unix platform, nor for proprietary operating systems such as Mac OS X. The bare metal installer can therefore realistically only support one operating system platform. The easiest way to achieve this is to pick a Unix-based operating system that provides a bootable ``live CD''. -We have experience with installing EPrints repositories under Ubuntu Linux, which provides a live CD feature, so this is an obvious choice. The Ubuntu live CD is also easily customisable, so we could create our own custom live CD that included not only the base operating system but also the required packages for installing EPrints and our configurator software. (Note that installation of the repository software would be incorporated into the operating system installation process, so the standalone repository installer would not be required for bare metal installs.) +We have experience at Otago with installing EPrints repositories under Ubuntu Linux\footnote{\url{http://www.ubuntu.com/}}, which provides a live CD feature, so this is an obvious choice. The Ubuntu live CD is also easily customisable, so a custom live CD could be created that included not only the base operating system but also the required packages for installing EPrints and our configurator software. (Note that installation of the repository software would be incorporated into the operating system installation process, so the standalone repository installer would not be required for bare metal installs.) \subsection{Package installation} Unfortunately Unix-based environments do not provide as much uniformity of operating environment as we would like. There is wide variation even across different Linux distributions, with regard to package installation and management, system environment and standard toolsets. The process for installing a required package is completely different under Mac OS X, Debian Linux, Red Hat Linux and FreeBSD, for example, and there are even sometimes multiple package management mechanisms available within the same operating system distribution. -We therefore need to consider whether the standalone repository installer for existing systems should use the native package management software (e.g., Red Hat's \texttt{rpm} or Debian's \texttt{dpkg}), or independent installer software. If we take the native route, the installer will need to detect the operating system version and then look for appropriate package management tools, which of course makes implementation more complex. If we do not go native, the implementation will be simpler, but we would lose the significant advantage of having packages managed by the operating system, which is particularly useful for dependency management and upgrades. We would therefore prefer the native option. +It therefore needs to be considered whether the standalone repository installer for existing systems should use the native package management software (e.g., Red Hat's \texttt{rpm} or Debian's \texttt{dpkg}), or independent installer software. If the native route is taken, the installer will need to detect the operating system version and then look for appropriate package management tools, which of course makes implementation more complex. The non-native route will lead to a simpler implementation, but would lose the significant advantage of having packages managed by the operating system, which is particularly useful for dependency management and upgrades. The native option is therefore preferred. \subsection{Repository installation and configuration interface} -We need to consider what kind of interface to present to the person performing the repository installation and configuration process. Possible options include: +The kind of interface to present to the person performing the repository installation and configuration process also needs to be considered. Possible options include: \begin{description} - \item[Use operating system-provided installer] We could use the native installer program supplied by the operating system (if such exists), such as the Mac OS X installer application. While this would provide an installation experience that is consistent with the user's interface expectations, this would almost certainly require the development of separate installers for each operating system platform, with consequent increase in development and maintenance complexity. It is also unclear whether such tools would also be able to effectively implement the configuration step, and they may or may not be able to integrate with any native OS package management tools (this is certainly not the case for the Mac OS X installer, for example). + \item[Use operating system-provided installer] The native installer program supplied by the operating system could be used (if such exists), such as the Mac OS X installer application. While this would provide an installation experience that is consistent with the user's interface expectations, this would almost certainly require the development of separate installers for each operating system platform, with consequent increase in development and maintenance complexity. It is also unclear whether such tools would also be able to effectively implement the configuration step, and they may or may not be able to integrate with any native OS package management tools (this is certainly not the case for the Mac OS X installer, for example). \item[Cross-platform GUI-based installer and configurator] There are many cross-platform installer tools available that could be used to build an installation tool. Many of these tools are written in Java, which could enable the installation user interface to look reasonably ``native'' for each platform. Non Java-based tools may impose a particular look and feel which could be visually jarring on different platforms. As with the native installer option, it is also currently unclear whether any of these tools could provide a GUI for the configuration step, and they may or may not be able to integrate with the native operating system package management tools. - \item[Web-based installer and configurator] A web interface could be used to manage the installation and configuration process. This would require an active web server with some sort of back-end scripting support, so an embedded web server may be necessary for the initial installation step. There is also the issue of gaining administrator level access in order to install and configure many of the components. This is not insurmountable, however, as web-based system administration tools like Webmin can do this. The big advantage of using a web browser is that it should work on almost any platform if we adhere to web standards, and it will provide a reasonably ``native'' user interface experience in all cases. + \item[Web-based installer and configurator] A web interface could be used to manage the installation and configuration process. This would require an active web server with some sort of back-end scripting support, so an embedded web server may be necessary for the initial installation step. There is also the issue of gaining administrator level access in order to install and configure many of the components. This is not insurmountable, however, as web-based system administration tools like Webmin can do this. The big advantage of using a web browser is that it should work on almost any platform if web standards are adhered to, and it will provide a reasonably ``native'' user interface experience in all cases. - \item[Shell-based installer with text interface] This is the lowest common denominator for all Unix-based systems. We can guarantee that almost any Unix-based system will have some variant of C-shell available, or at least something compatible. The interface will not be very ``pretty'', but will be relatively simple to implement, and can handle both the installation and configuration steps without any difficulty, including prompting for administrator-level access. If implemented in a modular fashion, the installer/configurator should be readily portable to other Unix-based operating systems. Furthermore, a shell-based configurator could even act as a back-end application layer behind a web-based front end, solving two problems at once. + \item[Shell-based installer with text interface] This is the lowest common denominator for all Unix-based systems. Almost any Unix-based system will have some variant of C-shell available, or at least something compatible. The interface will not be very ``pretty'', but will be relatively simple to implement, and can handle both the installation and configuration steps without any difficulty, including prompting for administrator-level access. If implemented in a modular fashion, the installer/configurator should be readily portable to other Unix-based operating systems. Furthermore, a shell-based configurator could even act as a back-end application layer behind a web-based front end, solving two problems at once. \end{description} -We feel that the web-based option provides the best compromise between a truly ``native'' user interface and the flexibility required to provide a cross-platform solution that can interface with native package management tools, especially when combined with the shell-based option. +The web-based option provides the best compromise between a truly ``native'' user interface and the flexibility required to provide a cross-platform solution that can interface with native package management tools, especially when combined with the shell-based option. \subsection{Distribution media} -While we have talked so far about distribution on CD-ROMs, we see no particular reason to limit the solution to only this medium. For example, the solution could also be made available in DVD form and as downloadable disk images. This will provide repository implementers with a choice of installation media to suit the vagaries of their particular installation environment. +While the discussion so far has been about distribution on CD-ROMs, there is no particular reason to limit the solution to only this medium. For example, the solution could also be made available in DVD form and as downloadable disk images. This will provide repository implementers with a choice of installation media to suit the vagaries of their particular installation environment. -Furthermore, it is likely that the CD-ROM version would actually comprise more than just a single CD-ROM. In the case of a bare metal install, you would not only need the operating system files, but also pre-compiled versions of all the prerequisite software in a package format appropriate for that operating system. Similarly, for an existing system install, we would need to include duplicate copies of all of the prerequisite software in appropriate formats for the various supported package management tools. This could easily run to at least two CD-ROMs, but would definitely fit onto a single DVD. +Furthermore, it is likely that the CD-ROM version would actually comprise more than just a single CD-ROM. A bare metal install would not only need the operating system files, but also pre-compiled versions of all the prerequisite software in a package format appropriate for that operating system. Similarly, an existing system install would need to include duplicate copies of all of the prerequisite software in appropriate formats for the various supported package management tools. This could easily run to at least two CD-ROMs, but would definitely fit onto a single DVD. -We also suggest that there should be separate disks for the bare metal install and the existing system install options, for the following reasons: +It is also recommended that there should be separate disks for the bare metal install and the existing system install options, for the following reasons: \begin{itemize} \item People with existing systems would not want to download an unnecessary operating system distribution in order to get the just repository software. @@ -128,9 +129,9 @@ In addition to these compulsory items, there are also numerous optional aspects of EPrints itself that can be configured, such as enabling the editorial buffer, required document formats, etc. These will be included as optional items within the configuration process, accessed via an ``advanced configuration'' page. The list of advanced configuration items should be easily extensible, probably via some form of XML specification, so as to cater for future developments. (This mechanism could also be used to specify compulsory configuration items.) -One optional configuration item of particular relevance to the OARiNZ project is configuration of the EPrints OAI-PMH interface. While we recommend that this remain an optional configuration, an unconfigured OAI-PMH subsystem should be prominently highlighted within the configurator interface, preferably on the main page. This gives repository implementers the option to forgo initial configuration of OAI-PMH, while gently encouraging them to eventually do so. +One optional configuration item of particular relevance to the OARiNZ project is configuration of the EPrints OAI-PMH interface. While it is recommended that this remain an optional configuration, an unconfigured OAI-PMH subsystem should be prominently highlighted within the configurator interface, preferably on the main page. This gives repository implementers the option to forgo initial configuration of OAI-PMH, while gently encouraging them to eventually do so. -On this note, we see no reason why the configurator should be limited to once-only use when the repository software is first installed. Rather, it should be installed alongside the repository software and used as a general management tool for creating and configuring repositories on that server. The configurator should keep an internal record of the configuration settings for each repository that it creates, which will make it easier to re-configure repositories at any time. The configurator should probably also check the saved configuration against the actual configuration files when opened, in case someone manually edits them. +On this note, there is no reason why the configurator should be limited to once-only use when the repository software is first installed. Rather, it should be installed alongside the repository software and used as a general management tool for creating and configuring repositories on that server. The configurator should keep an internal record of the configuration settings for each repository that it creates, which will make it easier to re-configure repositories at any time. The configurator should probably also check the saved configuration against the actual configuration files when opened, in case someone manually edits them. The configurator will not assist with the process of customising the look and feel of the repository web pages, simply because there are too many possible permutations of how to modify the look and feel. The configurator could, however, provide information on which files need to be changed in order to achieve this. @@ -247,17 +248,13 @@ \end{center} \end{figure} -In this scenario, shown in Figure~\ref{fig-reconfigure}, a repository administrator wishes to reconfigure their existing installation, for example, to create new repository or to change the settings of an existing repository. They launch the repository configurator that was installed on the server during the original installation process (\ding{'300}). This reads the existing repository configuration (\ding{'301}) and the configuration items list (\ding{'302}) and uses these to populate the configurator settings. When complete, the new configuration is saved for future reference. +In this scenario, shown in Figure~\ref{fig-reconfigure}, a repository administrator wishes to reconfigure their existing installation, for example, to create new repository or to change the settings of an existing repository. They launch the repository configurator that was installed on the server during the original installation process (\ding{'300}). This reads the existing repository configuration (\ding{'301}) and the configuration items list (\ding{'302}) and uses these to initialise the configurator. When complete, the new configuration is saved for future reference. \section{Implementation plan} -We envisage a phased implementation approach, with each phase building -on the outputs from the previous phase. However, not all of the tasks -are sequential in nature and may be able to be carried out in parallel. -Estimated start and finish dates are provided, but may be subject to -change as work progresses. +A phased implementation approach will be adopted, with each phase building on the outputs from the previous phase. However, not all of the tasks are sequential in nature and may be able to be carried out in parallel. Estimated start and finish dates are provided, but may be subject to change as work progresses. \subsection{Phase 1: Build shell-based repository installer/configurator} @@ -342,6 +339,8 @@ \section{Conclusion} + +This document has discussed the implementation of a DIY repository solution consistent with Objective 7 of the OARiNZ project. The proposed solution covers three main usage scenarios: installing a repository from scratch on new hardware, installing a repository on an existing system, and reconfiguring an existing repository. In all cases, the solution will reduce the complexity of the process and thus make it considerably easier for repository implementers to get up and running. \vfill {\scriptsize \hfill \verb+$Id$+}