Newer
Older
Digital_Repository / Old / Overview.tex
\documentclass[a4paper]{article}

\usepackage{mathpple} \usepackage[margin={1in,0.5in}]{geometry}
\usepackage{graphicx}

\title{School of Business Publications Repository \\
		(DRAFT: not for circulation)}
\author{Nigel Stanger\thanks{Department of Information Science, email
		\texttt{nstanger@infoscience.otago.ac.nz}.}}

\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em
    T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}

\begin{document}

\maketitle

\section{Executive Summary}

A database-managed repository is currently in the early stages of
development (under the auspices of the School's Information Technology
Policy Committee), for the purpose of storing (primarily research)
publications authored by staff within the School of Business. Such a
repository provides several important benefits, including:
\begin{itemize}

	\item A single, well-managed, flexible repository for storing
	details on publications within the School.

	\item Easily publish details of publications on the web, including
	downloadable copies of papers where appropriate.

	\item Eliminate (or at least reduce) duplication of publication data
	in multiple locations, thus enhancing consistency.

	\item A searchable database of publications spanning the entire
	School, accessible via the web. The repository will also be
	available to major web search engines, such as Google and Yahoo.
	
	\item Enable individual departments, research groups or staff
	members to generate web pages of their publications using whatever
	``look and feel'' that they desire.

	\item Improved workflow when forwarding publication details to
	Research, Enterprise and International (RE\&I) for inclusion into
	the annual list of University publications, and for PBRF.

\end{itemize}

The basic engine of such as system is currently being implemented, and
work is progressing.


\section{Why would such a system be useful?}

There are several reasons why such a system would be useful. First, it
provides a single, consistent, flexible way of disseminating publication
details via the web. Second, it will reduce the amount of duplication of
publication details that currently exists. Third, it will improve the
workflow associated with forwarding publication details to Research,
Enterprise and International.


\subsection{Web access}

Consider a person from outside the University wanting to find all
publications by a particular staff member in a particular department
within the School. For most departments they will typically find a list
of publications in chronological order, perhaps subdivided by
publication type. To find all publications by a particular staff member,
they will have to physically scan through all the publications web pages
to find what they want. If they are lucky, publication lists may be
available on individual staff members' web pages, but this is by no
means certain, and these lists are not usually comprehensive.

It would obviously be more effective to simply enter the name of the
author you are interested in into a search field, and quickly retrieve
only the publications by that author. To do so effectively requires an
underlying database and associated software, however, of all the
departments in the School, only the Department of Marketing has such a
system in use. The remaining departments use static, manually created
web pages that cannot easily be searched and are difficult to keep up to
date. (The author of this document is the coordinator of the Department
of Information Science Discussion Paper Series, and has first-hand
experience of the issues associated with this approach.)

The typical state of affairs for most departments is illustrated in
Figure~\ref{fig:current}. Considering only the left hand side of the
diagram for the moment, we see that authors produce publications, which
are submitted to some publication venue. Details of publications are
typically forwarded to a ``Publications Person'' within the department,
who organises placing those details on the department's web site. This
is usually a manual process, and may only occur once or twice per year.

\begin{figure}[htb]
	\includegraphics[width=\columnwidth,keepaspectratio]{PublicationsCurrent}
	\caption{Typical state of affairs for publications in most departments.}
	\label{fig:current}
\end{figure}

Contrast this with the situation shown in Figure~\ref{fig:repository}.
Authors load details of their publications directly into the new
publications repository. Once these details are verified by the
``Publications Person'', the publication immediately becomes visible on
the web. The whole process is streamlined considerably, and the
``Publications Person'' is spared the work of manually updating web
pages. The web pages generated by the repository will be template-based,
making it easy to customise web pages for specific purposes, and to
quickly change the ``look and feel'' of the entire system.

\begin{figure}[htb]
	\includegraphics[width=\columnwidth,keepaspectratio]{PublicationsRepository}
	\caption{The proposed publications repository.}
	\label{fig:repository}
\end{figure}

Many departments currently provide downloadable versions of papers
(where copyright allows), and this will obviously also be a feature of
the proposed repository. With a static web site it can be difficult to
determine whether a particular document has been downloaded, how many
times it has been downloaded, and by whom. With a dynamic web site
driven from the publications repository, it will be easy to track the
number of downloads for each publication. The system can even ask the
reader if they would like to enter their details, which will then be
automatically emailed to the author, enabling them to contact readers of
their publications and enhancing the possibilities for future
collaborations.

The repository will also be made visible to the major Internet search
engines such as Google and Yahoo, which will enhance the visibility of
the School's research output. It should also be possible to
automatically ``plug in'' to specialised publication search engines in
various disciplines (for example, CiteSeer).


\subsection{Single point of storage}

Publication details often appear in multiple locations under the current
regime (for example, in the department's full publication list and on
the author's personal web page). This can obviously lead to problems if
some detail of a publication needs to be changed---you might change one
entry, but miss another, resulting in inconsistencies. The repository
addresses this by creating a single point of storage for all
publications within the School. Changing a publication's details in the
database will change it everywhere that it appears.

It is envisioned that the repository will be a central resource for the
School, rather than being run on a department-by-department basis. It
will be run on a central server and be accessible by all. Authors will
be able to log in to the repository in order to enter their
publications, and each department will have a designated ``Publications
Person'' who verifies the details of new publications and makes them
visible to the outside world (more on this person's responsibilities
shortly).


\subsection{Publications workflow}

Referring again to Figure~\ref{fig:current}, we see that the major flow
of data relating to publications is from authors to RE\&I. This flow is
usually mediated by a ``Publications Person'' within a department. This
person has access to the ResearchMaster database, and ensures that staff
publications are entered into this database in the correct format, and
with all required details. This is typically a manual process that might
take place once or twice a year. The annual University publications list
is produced directly from the ResearchMaster database.

PBRF has introduced a second parallel database: the Performer database,
which stores details of staff members' research performance, including
publication details. These details can be extracted from the existing
ResearchMaster database, so no further consideration of the Performer
database is required here.

Now consider Figure~\ref{fig:repository}. Once a new publication has
been verified by the ``Publications Person'', the details of this
publication will be immediately available for entry into the
ResearchMaster database. There are at least four ways that this could
occur, in roughly descending order of preference:
\begin{enumerate}

	\item The publications details are automatically loaded directly into
	ResearchMaster.
	
	\item RE\&I periodically query the publications repository for
	new publications.

	\item At the end of each year, the ``Publications Person'' generates
	a list of new publications in some suitable format, and forwards
	this list to RE\&I for entry into ResearchMaster.

	\item At the end of each year, the ``Publications Person'' generates
	a text file of new publications, and copies and pastes the details
	into the ResearchMaster web interface.

\end{enumerate}
The last option is probably only a slight variation on what happens at
present (staff email publication details to the ``Publications Person'',
and these are copied and pasted into the web interface). It is likely
that more than one of these options will be implemented in the
publications repository, but technical considerations to do with
interfacing the two systems could potentially rule out the first option.


\subsection{Responsibilities of the ``Publications Person''}

The last thing anyone wants to do is to burden the ``Publications
Person'' with any more work than they are undertaking at present. The
publications repository is in fact intended to reduce the amount of work
these people have to do, by streamlining and semi-automating many of the
processes that currently exist.

At present, the ``Publications Person'' primarily acts as a combination
of a publication information collator (ensuring that all required
details have been collected, and querying authors for any information
that is missing) and a data entry operator (manually entering these
details into ResearchMaster, and also any departmental database that
might exist). Some also manage the dissemination of publication details
on the web, usually by manually editing web pages. Most usually have
other additional related or unrelated responsibilities.

With the publications repository in place, this person's
responsibilities would normally comprise the following:
\begin{itemize}

	\item Verifying new entries into the repository to ensure that the
	publication is valid and all important details have been included.
	
	\item Making verified publications visible to the outside world
	(this should just be a matter of checking a box on a web form).
	
	\item Possibly transferring data from the repository to
	ResearchMaster (depending on how this link is implemented, as noted
	earlier).

\end{itemize}
There are two important points to note here. First, the ``Publications
Person'' does not enter new publications into the repository. Rather,
this is done by authors directly. Entry of required details (which will
vary according to the type of publication) will be enforced by the
repository's web interface. Verification will therefore become more of a
quality control process than an exercise in data gathering. Second, the
only thing that the ``Publications Person'' needs to do to make a
publication visible on the web is to check a box to indicate that the
publication has been verified. No manual editing of web pages is
necessary.

The combination of getting authors to directly enter their own
publications and automated web publishing should reduce the amount of
work undertaken by the ``Publications Person''. The only aspect of the
process that might not change (as noted earlier) is the submission of
publication details to RE\&I.


\section{System requirements}

The following are the original requirements as set forth by the School's
IT Policy Committee in late 2002. They have been lightly edited for
clarity and consistency, and additional comments have been included in
[brackets].
\begin{enumerate}

	\item The repository will store electronically various research
	publications produced by staff (and students?) within the School of
	Business.
	
	[Obviously the repository does not have to be restricted to only
	research publications. Also, it will not be possible to store some
	publications in the database because of copyright constraints.]
	
	\item The repository content will be sortable by type (technical
	report or conference paper), author, department (Information
	Science, Marketing) and subject keyword (interesting to see
	inter-disciplinary research).
	
	[Date is another important criterion. Much of this requirement will
	be taken care of by the search feature of the repository. It should
	be possible to search on combinations of criteria (e.g.,
	publications on ``data mining'' by Nigel Stanger published within
	the last three years).]
	
	\item Abstracts should be selectable.
	
	\item The repository should also be able to format a listing as
	required by the University's ``Publications'' document.
	
	[This could be as simple as including an output format selector on
	the search form. Multiple output formats could be supported:
	ResearchMaster, Otago CV, \BibTeX, Refer format (for import into
	EndNote), XML, plain text, etc.]
	
	\item The site should be accessible from every department's home
	page.
	
	[This should just be a matter of including a link on the home page
	that performs a search on ``department = `XXX'\,''. A similar
	principle can be applied to individuals and research groups.]
	
	\item Each time a paper is downloaded, the author(s) will be
	automatically and electronically (email?) notified of the event and
	of the paper downloaded and who downloaded it. This is to allow for
	the author to make contact with the person downloading the paper and
	to possibly develop collaborations with that person.
	
	[An obvious concern here is that authors of popular papers will be
	bombarded with an endless stream of download messages (download
	spam?). Given that there is no automatic way of determining who
	downloaded a paper, these messages would be essentially useless. We
	can solve the spam problem by limiting emails about ``anonymous''
	downloads to a monthly report detailing which of an author's papers
	were downloaded and how many times. We can solve the anonymity
	problem by asking downloaders if they would like to send their
	contact details (at least their name and email address) to the
	author, and presenting them with a form to do so. These details
	could perhaps also be stored in the database for future reference.
	
	The inverse of this feature could also be useful. That is, the
	ability for visitors to place a ``watch'' on particular documents or
	authors, so that they can be automatically notified of updates. This
	would require some sort of registration subsystem, and is not
	currently considered a core requirement.]
	
	\item The system will have the capability for individuals to simply
	upload their papers directly from their desktop. A process similar
	to that used by Blackboard for uploading documents. [Note that this
	is a standard feature provided by web browsers, and is not peculiar
	to Blackboard.] The system serves as a vehicle for distributing the
	School's research. It is not intended for verification that the
	paper is a published paper. If verification is required for say, end
	of year reporting to RE\&I by the department, a secure field could
	be included in the database that allows an appointed member of staff
	[the ``Publications Person''] to verify that the papers have been
	published, etc.
	
	\item All Tech Reports and Discussion Papers should still go through
	a Department's own reviewing process before being up-loaded to the
	site.
	
	[This is really a procedural rather than a technical issue.]

\end{enumerate}

An important point that also needs to be considered is that Marketing
already have a publications database. Any new system should therefore be
compatible with the database used by Marketing in order to ease the
transfer of data between the two systems. Note that this is not meant to
imply that the repository will necessarily replace Marketing's existing
database; merely that the two should be compatible so that data can be
moved in either direction as necessary.

The repository will be run on the School's existing servers and is being
developed using freely available (open source) software, so no
additional hardware or software will need to be purchased. The only
costs that will be incurred are associated with system infrastructure
development.


\section{Summary}

The School of Business IT Policy committee has set forth the
requirements for a publications repository for the School, and
development work is currently under way. The proposed repository will
streamline several processes associated with management of publication
details. In particular, it will provide a single point of storage for
details of all publications within the School. This will enhance the
consistency of publication details on departmental web sites, and will
automate the generation of publication web pages for departments,
research groups and individuals. The repository will be able to produce
output in multiple formats, and should also improve the workflow for
submitting publication details to RE\&I.

The basic infrastructure for the repository has been completed, and a
simple prototype system has been demonstrated to the committee. Work to
further enhance the prototype is currently progressing.


\vspace*{1cm}
\noindent Nigel Stanger \\
Project Manager \\
Department of Information Science


\end{document}