Newer
Older
Digital_Repository / Old / Overview.tex
nstanger on 12 Nov 2005 17 KB - Second attempt at importing!
  1. \documentclass[a4paper]{article}
  2.  
  3. \usepackage{mathpple} \usepackage[margin={1in,0.5in}]{geometry}
  4. \usepackage{graphicx}
  5.  
  6. \title{School of Business Publications Repository \\
  7. (DRAFT: not for circulation)}
  8. \author{Nigel Stanger\thanks{Department of Information Science, email
  9. \texttt{nstanger@infoscience.otago.ac.nz}.}}
  10.  
  11. \def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em
  12. T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}
  13.  
  14. \begin{document}
  15.  
  16. \maketitle
  17.  
  18. \section{Executive Summary}
  19.  
  20. A database-managed repository is currently in the early stages of
  21. development (under the auspices of the School's Information Technology
  22. Policy Committee), for the purpose of storing (primarily research)
  23. publications authored by staff within the School of Business. Such a
  24. repository provides several important benefits, including:
  25. \begin{itemize}
  26.  
  27. \item A single, well-managed, flexible repository for storing
  28. details on publications within the School.
  29.  
  30. \item Easily publish details of publications on the web, including
  31. downloadable copies of papers where appropriate.
  32.  
  33. \item Eliminate (or at least reduce) duplication of publication data
  34. in multiple locations, thus enhancing consistency.
  35.  
  36. \item A searchable database of publications spanning the entire
  37. School, accessible via the web. The repository will also be
  38. available to major web search engines, such as Google and Yahoo.
  39. \item Enable individual departments, research groups or staff
  40. members to generate web pages of their publications using whatever
  41. ``look and feel'' that they desire.
  42.  
  43. \item Improved workflow when forwarding publication details to
  44. Research, Enterprise and International (RE\&I) for inclusion into
  45. the annual list of University publications, and for PBRF.
  46.  
  47. \end{itemize}
  48.  
  49. The basic engine of such as system is currently being implemented, and
  50. work is progressing.
  51.  
  52.  
  53. \section{Why would such a system be useful?}
  54.  
  55. There are several reasons why such a system would be useful. First, it
  56. provides a single, consistent, flexible way of disseminating publication
  57. details via the web. Second, it will reduce the amount of duplication of
  58. publication details that currently exists. Third, it will improve the
  59. workflow associated with forwarding publication details to Research,
  60. Enterprise and International.
  61.  
  62.  
  63. \subsection{Web access}
  64.  
  65. Consider a person from outside the University wanting to find all
  66. publications by a particular staff member in a particular department
  67. within the School. For most departments they will typically find a list
  68. of publications in chronological order, perhaps subdivided by
  69. publication type. To find all publications by a particular staff member,
  70. they will have to physically scan through all the publications web pages
  71. to find what they want. If they are lucky, publication lists may be
  72. available on individual staff members' web pages, but this is by no
  73. means certain, and these lists are not usually comprehensive.
  74.  
  75. It would obviously be more effective to simply enter the name of the
  76. author you are interested in into a search field, and quickly retrieve
  77. only the publications by that author. To do so effectively requires an
  78. underlying database and associated software, however, of all the
  79. departments in the School, only the Department of Marketing has such a
  80. system in use. The remaining departments use static, manually created
  81. web pages that cannot easily be searched and are difficult to keep up to
  82. date. (The author of this document is the coordinator of the Department
  83. of Information Science Discussion Paper Series, and has first-hand
  84. experience of the issues associated with this approach.)
  85.  
  86. The typical state of affairs for most departments is illustrated in
  87. Figure~\ref{fig:current}. Considering only the left hand side of the
  88. diagram for the moment, we see that authors produce publications, which
  89. are submitted to some publication venue. Details of publications are
  90. typically forwarded to a ``Publications Person'' within the department,
  91. who organises placing those details on the department's web site. This
  92. is usually a manual process, and may only occur once or twice per year.
  93.  
  94. \begin{figure}[htb]
  95. \includegraphics[width=\columnwidth,keepaspectratio]{PublicationsCurrent}
  96. \caption{Typical state of affairs for publications in most departments.}
  97. \label{fig:current}
  98. \end{figure}
  99.  
  100. Contrast this with the situation shown in Figure~\ref{fig:repository}.
  101. Authors load details of their publications directly into the new
  102. publications repository. Once these details are verified by the
  103. ``Publications Person'', the publication immediately becomes visible on
  104. the web. The whole process is streamlined considerably, and the
  105. ``Publications Person'' is spared the work of manually updating web
  106. pages. The web pages generated by the repository will be template-based,
  107. making it easy to customise web pages for specific purposes, and to
  108. quickly change the ``look and feel'' of the entire system.
  109.  
  110. \begin{figure}[htb]
  111. \includegraphics[width=\columnwidth,keepaspectratio]{PublicationsRepository}
  112. \caption{The proposed publications repository.}
  113. \label{fig:repository}
  114. \end{figure}
  115.  
  116. Many departments currently provide downloadable versions of papers
  117. (where copyright allows), and this will obviously also be a feature of
  118. the proposed repository. With a static web site it can be difficult to
  119. determine whether a particular document has been downloaded, how many
  120. times it has been downloaded, and by whom. With a dynamic web site
  121. driven from the publications repository, it will be easy to track the
  122. number of downloads for each publication. The system can even ask the
  123. reader if they would like to enter their details, which will then be
  124. automatically emailed to the author, enabling them to contact readers of
  125. their publications and enhancing the possibilities for future
  126. collaborations.
  127.  
  128. The repository will also be made visible to the major Internet search
  129. engines such as Google and Yahoo, which will enhance the visibility of
  130. the School's research output. It should also be possible to
  131. automatically ``plug in'' to specialised publication search engines in
  132. various disciplines (for example, CiteSeer).
  133.  
  134.  
  135. \subsection{Single point of storage}
  136.  
  137. Publication details often appear in multiple locations under the current
  138. regime (for example, in the department's full publication list and on
  139. the author's personal web page). This can obviously lead to problems if
  140. some detail of a publication needs to be changed---you might change one
  141. entry, but miss another, resulting in inconsistencies. The repository
  142. addresses this by creating a single point of storage for all
  143. publications within the School. Changing a publication's details in the
  144. database will change it everywhere that it appears.
  145.  
  146. It is envisioned that the repository will be a central resource for the
  147. School, rather than being run on a department-by-department basis. It
  148. will be run on a central server and be accessible by all. Authors will
  149. be able to log in to the repository in order to enter their
  150. publications, and each department will have a designated ``Publications
  151. Person'' who verifies the details of new publications and makes them
  152. visible to the outside world (more on this person's responsibilities
  153. shortly).
  154.  
  155.  
  156. \subsection{Publications workflow}
  157.  
  158. Referring again to Figure~\ref{fig:current}, we see that the major flow
  159. of data relating to publications is from authors to RE\&I. This flow is
  160. usually mediated by a ``Publications Person'' within a department. This
  161. person has access to the ResearchMaster database, and ensures that staff
  162. publications are entered into this database in the correct format, and
  163. with all required details. This is typically a manual process that might
  164. take place once or twice a year. The annual University publications list
  165. is produced directly from the ResearchMaster database.
  166.  
  167. PBRF has introduced a second parallel database: the Performer database,
  168. which stores details of staff members' research performance, including
  169. publication details. These details can be extracted from the existing
  170. ResearchMaster database, so no further consideration of the Performer
  171. database is required here.
  172.  
  173. Now consider Figure~\ref{fig:repository}. Once a new publication has
  174. been verified by the ``Publications Person'', the details of this
  175. publication will be immediately available for entry into the
  176. ResearchMaster database. There are at least four ways that this could
  177. occur, in roughly descending order of preference:
  178. \begin{enumerate}
  179.  
  180. \item The publications details are automatically loaded directly into
  181. ResearchMaster.
  182. \item RE\&I periodically query the publications repository for
  183. new publications.
  184.  
  185. \item At the end of each year, the ``Publications Person'' generates
  186. a list of new publications in some suitable format, and forwards
  187. this list to RE\&I for entry into ResearchMaster.
  188.  
  189. \item At the end of each year, the ``Publications Person'' generates
  190. a text file of new publications, and copies and pastes the details
  191. into the ResearchMaster web interface.
  192.  
  193. \end{enumerate}
  194. The last option is probably only a slight variation on what happens at
  195. present (staff email publication details to the ``Publications Person'',
  196. and these are copied and pasted into the web interface). It is likely
  197. that more than one of these options will be implemented in the
  198. publications repository, but technical considerations to do with
  199. interfacing the two systems could potentially rule out the first option.
  200.  
  201.  
  202. \subsection{Responsibilities of the ``Publications Person''}
  203.  
  204. The last thing anyone wants to do is to burden the ``Publications
  205. Person'' with any more work than they are undertaking at present. The
  206. publications repository is in fact intended to reduce the amount of work
  207. these people have to do, by streamlining and semi-automating many of the
  208. processes that currently exist.
  209.  
  210. At present, the ``Publications Person'' primarily acts as a combination
  211. of a publication information collator (ensuring that all required
  212. details have been collected, and querying authors for any information
  213. that is missing) and a data entry operator (manually entering these
  214. details into ResearchMaster, and also any departmental database that
  215. might exist). Some also manage the dissemination of publication details
  216. on the web, usually by manually editing web pages. Most usually have
  217. other additional related or unrelated responsibilities.
  218.  
  219. With the publications repository in place, this person's
  220. responsibilities would normally comprise the following:
  221. \begin{itemize}
  222.  
  223. \item Verifying new entries into the repository to ensure that the
  224. publication is valid and all important details have been included.
  225. \item Making verified publications visible to the outside world
  226. (this should just be a matter of checking a box on a web form).
  227. \item Possibly transferring data from the repository to
  228. ResearchMaster (depending on how this link is implemented, as noted
  229. earlier).
  230.  
  231. \end{itemize}
  232. There are two important points to note here. First, the ``Publications
  233. Person'' does not enter new publications into the repository. Rather,
  234. this is done by authors directly. Entry of required details (which will
  235. vary according to the type of publication) will be enforced by the
  236. repository's web interface. Verification will therefore become more of a
  237. quality control process than an exercise in data gathering. Second, the
  238. only thing that the ``Publications Person'' needs to do to make a
  239. publication visible on the web is to check a box to indicate that the
  240. publication has been verified. No manual editing of web pages is
  241. necessary.
  242.  
  243. The combination of getting authors to directly enter their own
  244. publications and automated web publishing should reduce the amount of
  245. work undertaken by the ``Publications Person''. The only aspect of the
  246. process that might not change (as noted earlier) is the submission of
  247. publication details to RE\&I.
  248.  
  249.  
  250. \section{System requirements}
  251.  
  252. The following are the original requirements as set forth by the School's
  253. IT Policy Committee in late 2002. They have been lightly edited for
  254. clarity and consistency, and additional comments have been included in
  255. [brackets].
  256. \begin{enumerate}
  257.  
  258. \item The repository will store electronically various research
  259. publications produced by staff (and students?) within the School of
  260. Business.
  261. [Obviously the repository does not have to be restricted to only
  262. research publications. Also, it will not be possible to store some
  263. publications in the database because of copyright constraints.]
  264. \item The repository content will be sortable by type (technical
  265. report or conference paper), author, department (Information
  266. Science, Marketing) and subject keyword (interesting to see
  267. inter-disciplinary research).
  268. [Date is another important criterion. Much of this requirement will
  269. be taken care of by the search feature of the repository. It should
  270. be possible to search on combinations of criteria (e.g.,
  271. publications on ``data mining'' by Nigel Stanger published within
  272. the last three years).]
  273. \item Abstracts should be selectable.
  274. \item The repository should also be able to format a listing as
  275. required by the University's ``Publications'' document.
  276. [This could be as simple as including an output format selector on
  277. the search form. Multiple output formats could be supported:
  278. ResearchMaster, Otago CV, \BibTeX, Refer format (for import into
  279. EndNote), XML, plain text, etc.]
  280. \item The site should be accessible from every department's home
  281. page.
  282. [This should just be a matter of including a link on the home page
  283. that performs a search on ``department = `XXX'\,''. A similar
  284. principle can be applied to individuals and research groups.]
  285. \item Each time a paper is downloaded, the author(s) will be
  286. automatically and electronically (email?) notified of the event and
  287. of the paper downloaded and who downloaded it. This is to allow for
  288. the author to make contact with the person downloading the paper and
  289. to possibly develop collaborations with that person.
  290. [An obvious concern here is that authors of popular papers will be
  291. bombarded with an endless stream of download messages (download
  292. spam?). Given that there is no automatic way of determining who
  293. downloaded a paper, these messages would be essentially useless. We
  294. can solve the spam problem by limiting emails about ``anonymous''
  295. downloads to a monthly report detailing which of an author's papers
  296. were downloaded and how many times. We can solve the anonymity
  297. problem by asking downloaders if they would like to send their
  298. contact details (at least their name and email address) to the
  299. author, and presenting them with a form to do so. These details
  300. could perhaps also be stored in the database for future reference.
  301. The inverse of this feature could also be useful. That is, the
  302. ability for visitors to place a ``watch'' on particular documents or
  303. authors, so that they can be automatically notified of updates. This
  304. would require some sort of registration subsystem, and is not
  305. currently considered a core requirement.]
  306. \item The system will have the capability for individuals to simply
  307. upload their papers directly from their desktop. A process similar
  308. to that used by Blackboard for uploading documents. [Note that this
  309. is a standard feature provided by web browsers, and is not peculiar
  310. to Blackboard.] The system serves as a vehicle for distributing the
  311. School's research. It is not intended for verification that the
  312. paper is a published paper. If verification is required for say, end
  313. of year reporting to RE\&I by the department, a secure field could
  314. be included in the database that allows an appointed member of staff
  315. [the ``Publications Person''] to verify that the papers have been
  316. published, etc.
  317. \item All Tech Reports and Discussion Papers should still go through
  318. a Department's own reviewing process before being up-loaded to the
  319. site.
  320. [This is really a procedural rather than a technical issue.]
  321.  
  322. \end{enumerate}
  323.  
  324. An important point that also needs to be considered is that Marketing
  325. already have a publications database. Any new system should therefore be
  326. compatible with the database used by Marketing in order to ease the
  327. transfer of data between the two systems. Note that this is not meant to
  328. imply that the repository will necessarily replace Marketing's existing
  329. database; merely that the two should be compatible so that data can be
  330. moved in either direction as necessary.
  331.  
  332. The repository will be run on the School's existing servers and is being
  333. developed using freely available (open source) software, so no
  334. additional hardware or software will need to be purchased. The only
  335. costs that will be incurred are associated with system infrastructure
  336. development.
  337.  
  338.  
  339. \section{Summary}
  340.  
  341. The School of Business IT Policy committee has set forth the
  342. requirements for a publications repository for the School, and
  343. development work is currently under way. The proposed repository will
  344. streamline several processes associated with management of publication
  345. details. In particular, it will provide a single point of storage for
  346. details of all publications within the School. This will enhance the
  347. consistency of publication details on departmental web sites, and will
  348. automate the generation of publication web pages for departments,
  349. research groups and individuals. The repository will be able to produce
  350. output in multiple formats, and should also improve the workflow for
  351. submitting publication details to RE\&I.
  352.  
  353. The basic infrastructure for the repository has been completed, and a
  354. simple prototype system has been demonstrated to the committee. Work to
  355. further enhance the prototype is currently progressing.
  356.  
  357.  
  358. \vspace*{1cm}
  359. \noindent Nigel Stanger \\
  360. Project Manager \\
  361. Department of Information Science
  362.  
  363.  
  364. \end{document}