\documentclass[a4paper]{article} \usepackage{mathpple} \usepackage[margin={1in,0.5in}]{geometry} \usepackage{graphicx} \title{School of Business Publications Repository \\ (DRAFT: not for circulation)} \author{Nigel Stanger\thanks{Department of Information Science, email \texttt{nstanger@infoscience.otago.ac.nz}.}} \def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}} \begin{document} \maketitle \section{Executive Summary} A database-managed repository is currently in the early stages of development (under the auspices of the School's Information Technology Policy Committee), for the purpose of storing (primarily research) publications authored by staff within the School of Business. Such a repository provides several important benefits, including: \begin{itemize} \item A single, well-managed, flexible repository for storing details on publications within the School. \item Easily publish details of publications on the web, including downloadable copies of papers where appropriate. \item Eliminate (or at least reduce) duplication of publication data in multiple locations, thus enhancing consistency. \item A searchable database of publications spanning the entire School, accessible via the web. The repository will also be available to major web search engines, such as Google and Yahoo. \item Enable individual departments, research groups or staff members to generate web pages of their publications using whatever ``look and feel'' that they desire. \item Improved workflow when forwarding publication details to Research, Enterprise and International (RE\&I) for inclusion into the annual list of University publications, and for PBRF. \end{itemize} The basic engine of such as system is currently being implemented, and work is progressing. \section{Why would such a system be useful?} There are several reasons why such a system would be useful. First, it provides a single, consistent, flexible way of disseminating publication details via the web. Second, it will reduce the amount of duplication of publication details that currently exists. Third, it will improve the workflow associated with forwarding publication details to Research, Enterprise and International. \subsection{Web access} Consider a person from outside the University wanting to find all publications by a particular staff member in a particular department within the School. For most departments they will typically find a list of publications in chronological order, perhaps subdivided by publication type. To find all publications by a particular staff member, they will have to physically scan through all the publications web pages to find what they want. If they are lucky, publication lists may be available on individual staff members' web pages, but this is by no means certain, and these lists are not usually comprehensive. It would obviously be more effective to simply enter the name of the author you are interested in into a search field, and quickly retrieve only the publications by that author. To do so effectively requires an underlying database and associated software, however, of all the departments in the School, only the Department of Marketing has such a system in use. The remaining departments use static, manually created web pages that cannot easily be searched and are difficult to keep up to date. (The author of this document is the coordinator of the Department of Information Science Discussion Paper Series, and has first-hand experience of the issues associated with this approach.) The typical state of affairs for most departments is illustrated in Figure~\ref{fig:current}. Considering only the left hand side of the diagram for the moment, we see that authors produce publications, which are submitted to some publication venue. Details of publications are typically forwarded to a ``Publications Person'' within the department, who organises placing those details on the department's web site. This is usually a manual process, and may only occur once or twice per year. \begin{figure}[htb] \includegraphics[width=\columnwidth,keepaspectratio]{PublicationsCurrent} \caption{Typical state of affairs for publications in most departments.} \label{fig:current} \end{figure} Contrast this with the situation shown in Figure~\ref{fig:repository}. Authors load details of their publications directly into the new publications repository. Once these details are verified by the ``Publications Person'', the publication immediately becomes visible on the web. The whole process is streamlined considerably, and the ``Publications Person'' is spared the work of manually updating web pages. The web pages generated by the repository will be template-based, making it easy to customise web pages for specific purposes, and to quickly change the ``look and feel'' of the entire system. \begin{figure}[htb] \includegraphics[width=\columnwidth,keepaspectratio]{PublicationsRepository} \caption{The proposed publications repository.} \label{fig:repository} \end{figure} Many departments currently provide downloadable versions of papers (where copyright allows), and this will obviously also be a feature of the proposed repository. With a static web site it can be difficult to determine whether a particular document has been downloaded, how many times it has been downloaded, and by whom. With a dynamic web site driven from the publications repository, it will be easy to track the number of downloads for each publication. The system can even ask the reader if they would like to enter their details, which will then be automatically emailed to the author, enabling them to contact readers of their publications and enhancing the possibilities for future collaborations. The repository will also be made visible to the major Internet search engines such as Google and Yahoo, which will enhance the visibility of the School's research output. It should also be possible to automatically ``plug in'' to specialised publication search engines in various disciplines (for example, CiteSeer). \subsection{Single point of storage} Publication details often appear in multiple locations under the current regime (for example, in the department's full publication list and on the author's personal web page). This can obviously lead to problems if some detail of a publication needs to be changed---you might change one entry, but miss another, resulting in inconsistencies. The repository addresses this by creating a single point of storage for all publications within the School. Changing a publication's details in the database will change it everywhere that it appears. It is envisioned that the repository will be a central resource for the School, rather than being run on a department-by-department basis. It will be run on a central server and be accessible by all. Authors will be able to log in to the repository in order to enter their publications, and each department will have a designated ``Publications Person'' who verifies the details of new publications and makes them visible to the outside world (more on this person's responsibilities shortly). \subsection{Publications workflow} Referring again to Figure~\ref{fig:current}, we see that the major flow of data relating to publications is from authors to RE\&I. This flow is usually mediated by a ``Publications Person'' within a department. This person has access to the ResearchMaster database, and ensures that staff publications are entered into this database in the correct format, and with all required details. This is typically a manual process that might take place once or twice a year. The annual University publications list is produced directly from the ResearchMaster database. PBRF has introduced a second parallel database: the Performer database, which stores details of staff members' research performance, including publication details. These details can be extracted from the existing ResearchMaster database, so no further consideration of the Performer database is required here. Now consider Figure~\ref{fig:repository}. Once a new publication has been verified by the ``Publications Person'', the details of this publication will be immediately available for entry into the ResearchMaster database. There are at least four ways that this could occur, in roughly descending order of preference: \begin{enumerate} \item The publications details are automatically loaded directly into ResearchMaster. \item RE\&I periodically query the publications repository for new publications. \item At the end of each year, the ``Publications Person'' generates a list of new publications in some suitable format, and forwards this list to RE\&I for entry into ResearchMaster. \item At the end of each year, the ``Publications Person'' generates a text file of new publications, and copies and pastes the details into the ResearchMaster web interface. \end{enumerate} The last option is probably only a slight variation on what happens at present (staff email publication details to the ``Publications Person'', and these are copied and pasted into the web interface). It is likely that more than one of these options will be implemented in the publications repository, but technical considerations to do with interfacing the two systems could potentially rule out the first option. \subsection{Responsibilities of the ``Publications Person''} The last thing anyone wants to do is to burden the ``Publications Person'' with any more work than they are undertaking at present. The publications repository is in fact intended to reduce the amount of work these people have to do, by streamlining and semi-automating many of the processes that currently exist. At present, the ``Publications Person'' primarily acts as a combination of a publication information collator (ensuring that all required details have been collected, and querying authors for any information that is missing) and a data entry operator (manually entering these details into ResearchMaster, and also any departmental database that might exist). Some also manage the dissemination of publication details on the web, usually by manually editing web pages. Most usually have other additional related or unrelated responsibilities. With the publications repository in place, this person's responsibilities would normally comprise the following: \begin{itemize} \item Verifying new entries into the repository to ensure that the publication is valid and all important details have been included. \item Making verified publications visible to the outside world (this should just be a matter of checking a box on a web form). \item Possibly transferring data from the repository to ResearchMaster (depending on how this link is implemented, as noted earlier). \end{itemize} There are two important points to note here. First, the ``Publications Person'' does not enter new publications into the repository. Rather, this is done by authors directly. Entry of required details (which will vary according to the type of publication) will be enforced by the repository's web interface. Verification will therefore become more of a quality control process than an exercise in data gathering. Second, the only thing that the ``Publications Person'' needs to do to make a publication visible on the web is to check a box to indicate that the publication has been verified. No manual editing of web pages is necessary. The combination of getting authors to directly enter their own publications and automated web publishing should reduce the amount of work undertaken by the ``Publications Person''. The only aspect of the process that might not change (as noted earlier) is the submission of publication details to RE\&I. \section{System requirements} The following are the original requirements as set forth by the School's IT Policy Committee in late 2002. They have been lightly edited for clarity and consistency, and additional comments have been included in [brackets]. \begin{enumerate} \item The repository will store electronically various research publications produced by staff (and students?) within the School of Business. [Obviously the repository does not have to be restricted to only research publications. Also, it will not be possible to store some publications in the database because of copyright constraints.] \item The repository content will be sortable by type (technical report or conference paper), author, department (Information Science, Marketing) and subject keyword (interesting to see inter-disciplinary research). [Date is another important criterion. Much of this requirement will be taken care of by the search feature of the repository. It should be possible to search on combinations of criteria (e.g., publications on ``data mining'' by Nigel Stanger published within the last three years).] \item Abstracts should be selectable. \item The repository should also be able to format a listing as required by the University's ``Publications'' document. [This could be as simple as including an output format selector on the search form. Multiple output formats could be supported: ResearchMaster, Otago CV, \BibTeX, Refer format (for import into EndNote), XML, plain text, etc.] \item The site should be accessible from every department's home page. [This should just be a matter of including a link on the home page that performs a search on ``department = `XXX'\,''. A similar principle can be applied to individuals and research groups.] \item Each time a paper is downloaded, the author(s) will be automatically and electronically (email?) notified of the event and of the paper downloaded and who downloaded it. This is to allow for the author to make contact with the person downloading the paper and to possibly develop collaborations with that person. [An obvious concern here is that authors of popular papers will be bombarded with an endless stream of download messages (download spam?). Given that there is no automatic way of determining who downloaded a paper, these messages would be essentially useless. We can solve the spam problem by limiting emails about ``anonymous'' downloads to a monthly report detailing which of an author's papers were downloaded and how many times. We can solve the anonymity problem by asking downloaders if they would like to send their contact details (at least their name and email address) to the author, and presenting them with a form to do so. These details could perhaps also be stored in the database for future reference. The inverse of this feature could also be useful. That is, the ability for visitors to place a ``watch'' on particular documents or authors, so that they can be automatically notified of updates. This would require some sort of registration subsystem, and is not currently considered a core requirement.] \item The system will have the capability for individuals to simply upload their papers directly from their desktop. A process similar to that used by Blackboard for uploading documents. [Note that this is a standard feature provided by web browsers, and is not peculiar to Blackboard.] The system serves as a vehicle for distributing the School's research. It is not intended for verification that the paper is a published paper. If verification is required for say, end of year reporting to RE\&I by the department, a secure field could be included in the database that allows an appointed member of staff [the ``Publications Person''] to verify that the papers have been published, etc. \item All Tech Reports and Discussion Papers should still go through a Department's own reviewing process before being up-loaded to the site. [This is really a procedural rather than a technical issue.] \end{enumerate} An important point that also needs to be considered is that Marketing already have a publications database. Any new system should therefore be compatible with the database used by Marketing in order to ease the transfer of data between the two systems. Note that this is not meant to imply that the repository will necessarily replace Marketing's existing database; merely that the two should be compatible so that data can be moved in either direction as necessary. The repository will be run on the School's existing servers and is being developed using freely available (open source) software, so no additional hardware or software will need to be purchased. The only costs that will be incurred are associated with system infrastructure development. \section{Summary} The School of Business IT Policy committee has set forth the requirements for a publications repository for the School, and development work is currently under way. The proposed repository will streamline several processes associated with management of publication details. In particular, it will provide a single point of storage for details of all publications within the School. This will enhance the consistency of publication details on departmental web sites, and will automate the generation of publication web pages for departments, research groups and individuals. The repository will be able to produce output in multiple formats, and should also improve the workflow for submitting publication details to RE\&I. The basic infrastructure for the repository has been completed, and a simple prototype system has been demonstrated to the committee. Work to further enhance the prototype is currently progressing. \vspace*{1cm} \noindent Nigel Stanger \\ Project Manager \\ Department of Information Science \end{document}