Merged in documentation and filename changes from master.

nigel.stanger / ouexam

Browse code Merged in documentation and filename changes from master. master OUEXAM_2_4_2 OUEXAM_2_4_1 OUEXAM_2_4
commit ed0f9085b458de72288446655a68c80aea83b64d 2 parents cda5b39 + 6c0120b Nigel Stanger authored on 30 Oct 2014

Patch

Showing 7 changed files

Ignore Space Show notes View MANIFEST
CreateCluster.pdf - images for the example document CreateIndex.pdf CreateObjectType.pdf CreateTable1.pdf CreateTable2.pdf HISTORY.md MANIFEST Makefile README.md TODO.md content.tex - example document source files example1.tex example2.tex lstlang0.sty - listings language definitions for the example ouexam.dtx - ouexam class and documentation ouexam.ins - ouexam installation driver CreateCluster.pdf - images for the example document CreateIndex.pdf CreateObjectType.pdf CreateTable1.pdf CreateTable2.pdf FAQ INSTALL MANIFEST Makefile README TODO example.tex - example document source files example1.tex example2.tex lstlang0.sty - listings language definitions for the example ouexam.dtx - ouexam class and documentation ouexam.ins - ouexam installation driver

Ignore Space Show notes View

MANIFEST

Ignore Space Show notes View Makefile
###################################################################### # # Makefile for ouexam document class. # ###################################################################### SHELL=/bin/sh .PHONY: class doc example install tidy clean # files EXAMPLE1_IMAGES = eg1-1.pdf eg1-2.pdf eg1-3.pdf EXAMPLE2_IMAGES = eg2-2.pdf eg2-3.pdf eg2-4.pdf eg2-5.pdf # Build everything. all: class doc example # Build the class file only. class: ouexam.cls ouexam.cls: ouexam.dtx latex ouexam.ins # Build the documentation only. doc: ouexam.pdf ouexam.pdf: ouexam.dtx $(EXAMPLE1_IMAGES) $(EXAMPLE2_IMAGES) pdflatex $< pdflatex $< pdflatex $< $(EXAMPLE1_IMAGES): example1.pdf pdfjam --outfile $@ $< `echo $@ \| cut -c 5` $(EXAMPLE2_IMAGES): example2.pdf pdfjam --outfile $@ $< `echo $@ \| cut -c 5` # Build the example only. example: example1.pdf example2.pdf example%.pdf: example%.tex content.tex ouexam.cls pdflatex $< pdflatex $< # Install files in the appropriate locations in the texmf tree rooted at # $(TEXMFHOME). Either define this as a shell variable or pass it as # command line argument, e.g., make install TEXMFHOME=/usr/local/texmf. install: class doc example mkdir -p $(TEXMFHOME)/tex/latex/ouexam cp ouexam.cls $(TEXMFHOME)/tex/latex/ouexam mkdir -p $(TEXMFHOME)/doc/latex/ouexam/example cp eg.pdf ouexam.pdf ouexam.dtx ouexam.ins \ HISTORY.md MANIFEST README.md \ $(TEXMFHOME)/doc/latex/ouexam cp Create.pdf example.tex example.pdf lstlang0.sty \ $(TEXMFHOME)/doc/latex/ouexam/example texhash # Clean up: remove temporary files. tidy: rm -f .tmp .aux .out .log .glo .toc .ps \ $(EXAMPLE1_IMAGES) $(EXAMPLE2_IMAGES) # Clean up: remove everything except the original source. clean: tidy rm -f ouexam.pdf eg.pdf example.pdf .cls ###################################################################### # # Makefile for ouexam document class. # ###################################################################### SHELL=/bin/sh .PHONY: class doc example install tidy clean # files EXAMPLE1_IMAGES = eg1-1.pdf eg1-2.pdf eg1-3.pdf EXAMPLE2_IMAGES = eg2-2.pdf eg2-3.pdf eg2-4.pdf eg2-5.pdf # Build everything. all: class doc example # Build the class file only. class: ouexam.cls ouexam.cls: ouexam.dtx latex ouexam.ins # Build the documentation only. doc: ouexam.pdf ouexam.pdf: ouexam.dtx $(EXAMPLE1_IMAGES) $(EXAMPLE2_IMAGES) pdflatex $< pdflatex $< pdflatex $< $(EXAMPLE1_IMAGES): example1.pdf pdfjam --outfile $@ $< `echo $@ \| cut -c 5` $(EXAMPLE2_IMAGES): example2.pdf pdfjam --outfile $@ $< `echo $@ \| cut -c 5` # Build the example only. example: example1.pdf example2.pdf example%.pdf: example%.tex example.tex ouexam.cls pdflatex $< pdflatex $< # Install files in the appropriate locations in the texmf tree rooted at # $(TEXMFHOME). Either define this as a shell variable or pass it as # command line argument, e.g., make install TEXMFHOME=/usr/local/texmf. install: class doc example mkdir -p $(TEXMFHOME)/tex/latex/ouexam cp ouexam.cls $(TEXMFHOME)/tex/latex/ouexam mkdir -p $(TEXMFHOME)/doc/latex/ouexam/example cp eg.pdf ouexam.pdf ouexam.dtx ouexam.ins \ HISTORY.md MANIFEST README.md \ $(TEXMFHOME)/doc/latex/ouexam cp Create.pdf example.tex example.pdf lstlang0.sty \ $(TEXMFHOME)/doc/latex/ouexam/example texhash # Clean up: remove temporary files. tidy: rm -f .tmp .aux .out .log .glo .toc .ps \ $(EXAMPLE1_IMAGES) $(EXAMPLE2_IMAGES) # Clean up: remove everything except the original source. clean: tidy rm -f ouexam.pdf eg.pdf example.pdf .cls

Ignore Space Show notes View

Makefile

######################################################################
#
# Makefile for ouexam document class.
#
######################################################################

SHELL=/bin/sh

.PHONY: class doc example install tidy clean

# files
EXAMPLE1_IMAGES = eg1-1.pdf eg1-2.pdf eg1-3.pdf
EXAMPLE2_IMAGES = eg2-2.pdf eg2-3.pdf eg2-4.pdf eg2-5.pdf

# Build everything.
all: class doc example

# Build the class file only.
class: ouexam.cls
ouexam.cls: ouexam.dtx
	latex ouexam.ins

# Build the documentation only.
doc: ouexam.pdf

ouexam.pdf:	ouexam.dtx $(EXAMPLE1_IMAGES) $(EXAMPLE2_IMAGES)
	pdflatex $<
	pdflatex $<
	pdflatex $<

$(EXAMPLE1_IMAGES): example1.pdf
	pdfjam --outfile $@ $< `echo $@ | cut -c 5`

$(EXAMPLE2_IMAGES): example2.pdf
	pdfjam --outfile $@ $< `echo $@ | cut -c 5`

# Build the example only.
example: example1.pdf example2.pdf

example%.pdf: example%.tex content.tex ouexam.cls
	pdflatex $<
	pdflatex $<

# Install files in the appropriate locations in the texmf tree rooted at
# $(TEXMFHOME). Either define this as a shell variable or pass it as
# command line argument, e.g., make install TEXMFHOME=/usr/local/texmf.
install: class doc example
	mkdir -p $(TEXMFHOME)/tex/latex/ouexam
	cp ouexam.cls $(TEXMFHOME)/tex/latex/ouexam
	mkdir -p $(TEXMFHOME)/doc/latex/ouexam/example
	cp eg*.pdf ouexam.pdf ouexam.dtx ouexam.ins \
		HISTORY.md MANIFEST README.md \
		$(TEXMFHOME)/doc/latex/ouexam
	cp Create*.pdf example*.tex example*.pdf lstlang0.sty \
		$(TEXMFHOME)/doc/latex/ouexam/example
	texhash

# Clean up: remove temporary files.
tidy:
	rm -f *.tmp *.aux *.out *.log *.glo *.toc *.ps \
		$(EXAMPLE1_IMAGES) $(EXAMPLE2_IMAGES)

# Clean up: remove everything except the original source.
clean: tidy
	rm -f ouexam.pdf eg*.pdf example*.pdf *.cls

######################################################################
#
# Makefile for ouexam document class.
#
######################################################################

SHELL=/bin/sh

.PHONY: class doc example install tidy clean

# files
EXAMPLE1_IMAGES = eg1-1.pdf eg1-2.pdf eg1-3.pdf
EXAMPLE2_IMAGES = eg2-2.pdf eg2-3.pdf eg2-4.pdf eg2-5.pdf

# Build everything.
all: class doc example

# Build the class file only.
class: ouexam.cls
ouexam.cls: ouexam.dtx
	latex ouexam.ins

# Build the documentation only.
doc: ouexam.pdf

ouexam.pdf:	ouexam.dtx $(EXAMPLE1_IMAGES) $(EXAMPLE2_IMAGES)
	pdflatex $<
	pdflatex $<
	pdflatex $<

$(EXAMPLE1_IMAGES): example1.pdf
	pdfjam --outfile $@ $< `echo $@ | cut -c 5`

$(EXAMPLE2_IMAGES): example2.pdf
	pdfjam --outfile $@ $< `echo $@ | cut -c 5`

# Build the example only.
example: example1.pdf example2.pdf

example%.pdf: example%.tex example.tex ouexam.cls
	pdflatex $<
	pdflatex $<

# Clean up: remove temporary files.
tidy:
	rm -f *.tmp *.aux *.out *.log *.glo *.toc *.ps \
		$(EXAMPLE1_IMAGES) $(EXAMPLE2_IMAGES)

# Clean up: remove everything except the original source.
clean: tidy
	rm -f ouexam.pdf eg*.pdf example*.pdf *.cls

Ignore Space Show notes View README.md
ouexam v2.3.2 =========== ouexam document class v2.3.2, 9 October 2014 Copyright 1999–2014 Nigel Stanger and University of Otago This LaTeX2ε document class enables the production of University of Otago formatted examination papers. It handles all the fiddly layout requirements (such as printing “TURN OVER” at the bottom of every page except the last), and also ensures that the actual number of marks for questions in the examination add up to the expected number of marks. Requirements ------------ You will need the `verbatim`, `fontenc`, `textcomp` and `lmodern` packages in order to use `ouexam`. These should all come standard with most TeX installations. To build the documentation and example files, you will need `graphicx`, `hyperref`, at least version 1.1 of the `listings` package, and the `pdfjam` tool. Installing ---------- To install the easy way: 1 Unpack the distribution archive and `cd` to the distribution directory. 2 `make` 3 `make install TEXMF_INSTALL=/path/to/texmf`. `/path/to/texmf` should be the root of your preferred texmf tree (e.g., `/usr/share/texmf`). You may need to do this as root depending on which texmf tree you are installing into. You can also define `TEXMF_INSTALL` as an environment variable then simply type make install. To install manually, unpack the distribution archive, cd to the distribution directory, and do the following (now you know why there’s a Makefile ☺). % latex ouexam.ins % pdflatex example1.tex % pdflatex example1.tex % pdfjam --outfile eg1-1.pdf example1.pdf 1 % pdfjam --outfile eg1-2.pdf example1.pdf 2 % pdfjam --outfile eg1-3.pdf example1.pdf 3 % pdflatex example2.tex % pdflatex example2.tex % pdfjam --outfile eg2-1.pdf example2.pdf 1 % pdfjam --outfile eg2-2.pdf example2.pdf 2 % pdfjam --outfile eg2-3.pdf example2.pdf 3 % pdfjam --outfile eg2-4.pdf example2.pdf 4 % pdflatex ouexam.dtx % pdflatex ouexam.dtx % pdflatex ouexam.dtx Put `ouexam.cls` in `/path/to/texmf/tex/latex/ouexam`. Put `eg.pdf`, `ouexam.pdf`, `ouexam.dtx`, `ouexam.ins`, `HISTORY`, `MANIFEST`, `README` and `TODO` in `/path/to/texmf/doc/latex/ouexam`. Put `Create.pdf`, `example.tex`, `example.pdf` and `lstlang0.sty` in `/path/to/texmf/doc/latex/ouexam/example`. You can also build the examples using GUI tools such as TeXMaker, TeXShop, etc. (but see the FAQ below). FAQ --- The number of marks for my questions are not appearing at the end of the line even though there is plenty of room for them to fit. You probably have a blank line between the end of your question text and the `\end{question}` macro that actually generates the number of marks. The line break gets processed by TeX before the number of marks is generated and effectively generates a new paragraph. Until I figure out how to stop this from happening, the workaround is to not leave blank lines between the end of the question text and the `\end{question}`. Why can’t I build content.tex? `content.tex` is an include file for both `example1.tex` and `example2.tex`. It doesn’t have a `\documentclass` and you therefore can’t `pdflatex` it! The only input files you should be building are `example1.tex` and `example2.tex`. ouexam v2.3.1 =========== ouexam document class v2.3.1, 15 September 2014 Copyright 1999–2014 Nigel Stanger and University of Otago This LaTeX2e document class enables the production of University of Otago formatted examination papers. It handles all the fiddly layout requirements (such as printing “TURN OVER” at the bottom of every page except the last), and also ensures that the actual number of marks for questions in the examination add up to the expected number of marks. Requirements ------------ You will need the `verbatim`, `fontenc`, `textcomp` and `lmodern` packages in order to use `ouexam`. These should all come standard with most TeX installations. To build the documentation and example files, you will need `graphicx`, `hyperref`, at least version 1.1 of the `listings` package, and the `pdfjam` tool. Installing ---------- To install the easy way: 1 Unpack the distribution archive and `cd` to the distribution directory. 2 `make` 3 `make install TEXMF_INSTALL=/path/to/texmf`. `/path/to/texmf` should be the root of your preferred texmf tree (e.g., `/usr/share/texmf`). You may need to do this as root depending on which texmf tree you are installing into. You can also define `TEXMF_INSTALL` as an environment variable then simply type make install. To install manually, unpack the distribution archive, cd to the distribution directory, and do the following (now you know why there’s a Makefile ☺). % latex ouexam.ins % pdflatex example1.tex % pdflatex example1.tex % pdfjam --outfile eg1-1.pdf example1.pdf 1 % pdfjam --outfile eg1-2.pdf example1.pdf 2 % pdfjam --outfile eg1-3.pdf example1.pdf 3 % pdflatex example2.tex % pdflatex example2.tex % pdfjam --outfile eg2-1.pdf example2.pdf 1 % pdfjam --outfile eg2-2.pdf example2.pdf 2 % pdfjam --outfile eg2-3.pdf example2.pdf 3 % pdfjam --outfile eg2-4.pdf example2.pdf 4 % pdflatex ouexam.dtx % pdflatex ouexam.dtx % pdflatex ouexam.dtx Put `ouexam.cls` in `/path/to/texmf/tex/latex/ouexam`. Put `eg.pdf`, `ouexam.pdf`, `ouexam.dtx`, `ouexam.ins`, `HISTORY`, `MANIFEST`, `README` and `TODO` in `/path/to/texmf/doc/latex/ouexam`. Put `Create.pdf`, `example.tex`, `example.pdf` and `lstlang0.sty` in `/path/to/texmf/doc/latex/ouexam/example`. FAQ --- The number of marks for my questions are not appearing at the end of the line even though there is plenty of room for them to fit. You probably have a blank line between the end of your question text and the `\end{question}` macro that actually generates the number of marks. The line break gets processed by TeX before the number of marks is generated and effectively generates a new paragraph. Until I figure out how to stop this from happening, the workaround is to not leave blank lines between the end of the question text and the `\end{question}`.

Ignore Space Show notes View

README.md

ouexam v2.3.2
===========

ouexam document class v2.3.2, 9 October 2014

This LaTeX2ε document class enables the production of University of Otago formatted examination papers. It handles all the fiddly layout requirements (such as printing “**TURN OVER**” at the bottom of every page except the last), and also ensures that the actual number of marks for questions in the examination add up to the expected number of marks.

Requirements
------------

You will need the `verbatim`, `fontenc`, `textcomp` and `lmodern` packages in order to use `ouexam`. These should all come standard with most TeX installations. To build the documentation and example files, you will need `graphicx`, `hyperref`, at least version 1.1 of the `listings` package, and the `pdfjam` tool.

Installing
----------

To install the easy way:

1 Unpack the distribution archive and `cd` to the distribution directory.

2 `make`

3 `make install TEXMF_INSTALL=/path/to/texmf`.
  `/path/to/texmf` should be the root of your preferred texmf tree
  (e.g., `/usr/share/texmf`). You may need to do this as root
  depending on which texmf tree you are installing into. You can
  also define `TEXMF_INSTALL` as an environment variable then simply
  type make install.

To install manually, unpack the distribution archive, cd to the distribution directory, and do the following (now you know why there’s a Makefile ☺).

% latex ouexam.ins
	% pdflatex example1.tex
	% pdflatex example1.tex
	% pdfjam --outfile eg1-1.pdf example1.pdf 1
	% pdfjam --outfile eg1-2.pdf example1.pdf 2
	% pdfjam --outfile eg1-3.pdf example1.pdf 3
	% pdflatex example2.tex
	% pdflatex example2.tex
	% pdfjam --outfile eg2-1.pdf example2.pdf 1
	% pdfjam --outfile eg2-2.pdf example2.pdf 2
	% pdfjam --outfile eg2-3.pdf example2.pdf 3
	% pdfjam --outfile eg2-4.pdf example2.pdf 4
	% pdflatex ouexam.dtx
	% pdflatex ouexam.dtx
	% pdflatex ouexam.dtx

Put `ouexam.cls` in `/path/to/texmf/tex/latex/ouexam`.

Put `eg*.pdf`, `ouexam.pdf`, `ouexam.dtx`, `ouexam.ins`, `HISTORY`, `MANIFEST`, `README` and `TODO` in `/path/to/texmf/doc/latex/ouexam`.

Put `Create*.pdf`, `example*.tex`, `example*.pdf` and `lstlang0.sty` in `/path/to/texmf/doc/latex/ouexam/example`.

You can also build the examples using GUI tools such as TeXMaker, TeXShop, etc. (but see the FAQ below).

FAQ
---

**The number of marks for my questions are not appearing at the end of the line even though there is plenty of room for them to fit.**

You probably have a blank line between the end of your question text and the `\end{question}` macro that actually generates the number of marks. The line break gets processed by TeX before the number of marks is generated and effectively generates a new paragraph. Until I figure out how to stop this from happening, the workaround is to not leave blank lines between the end of the question text and the `\end{question}`.

**Why can’t I build content.tex?**

`content.tex` is an include file for both `example1.tex` and `example2.tex`. It doesn’t have a `\documentclass` and you therefore *can’t* `pdflatex` it! The only input files you should be building are `example1.tex` and `example2.tex`.

ouexam v2.3.1
===========

ouexam document class v2.3.1, 15 September 2014

This LaTeX2e document class enables the production of University of Otago formatted examination papers. It handles all the fiddly layout requirements (such as printing “TURN OVER” at the bottom of every page except the last), and also ensures that the actual number of marks for questions in the examination add up to the expected number of marks.

Requirements
------------

Installing
----------

To install the easy way:

1 Unpack the distribution archive and `cd` to the distribution directory.

2 `make`

To install manually, unpack the distribution archive, cd to the distribution directory, and do the following (now you know why there’s a Makefile ☺).

Put `ouexam.cls` in `/path/to/texmf/tex/latex/ouexam`.

Put `eg*.pdf`, `ouexam.pdf`, `ouexam.dtx`, `ouexam.ins`, `HISTORY`, `MANIFEST`, `README` and `TODO` in `/path/to/texmf/doc/latex/ouexam`.

Put `Create*.pdf`, `example*.tex`, `example*.pdf` and `lstlang0.sty` in `/path/to/texmf/doc/latex/ouexam/example`.

FAQ
---

**The number of marks for my questions are not appearing at the end of the line even though there is plenty of room for them to fit.**

Ignore Space Show notes View content.tex 0 → 100644
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Note: No document class --- this is handled by the appropriate % "driver" file, which makes it easier to switch class options % such as markingschedule. Doing it this way means that you don't % wipe out the original exam paper when you want to generate a % marking schedule. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \usepackage{graphicx} \usepackage{listings} \examyear{2001} \department{Information Science} \papernumber{INFO 321} \papertitle{Database Systems} \instructions{Answer \underline{ALL} questions.} \allowcalculators[approved] \begin{document} \lstloadlanguages{ODL,OQL,[Oracle8]SQL,[Oracle8]PLSQL} \lstset% { basicstyle=\sffamily, showstringspaces=false, tabsize=2, xleftmargin=1cm, belowskip=0pt, commentstyle=\itshape, numbers=left, numberstyle=\scriptsize, numbersep=5pt, columns=fullflexible, } \maketitlepage %====================================================================== % PHYSICAL TUNING SECTION %====================================================================== \begin{examsection}{25}{}{Questions in this section (total 25 marks) relate to physical database design and tuning.} %====================================================================== \begin{question}{10} A major U.S. mail order firm is having performance problems with queries on the \textsf{Customer} table in its Oracle8 order-processing database. Consider the following information about the \textsf{Customer} table, then answer the questions below: \\ \textsf{\textbf{Customer}(\underline{customer\_no}, name, address, city, state, country, phone, email, status)} \begin{itemize} \item \textsf{customer\_no} is the primary key and because this is an Oracle8 database, a unique B-tree index has been automatically created on this column. \item \textsf{status} is a single character column that holds one of the values `I' (inactive), `B' (bronze), `S' (silver) or `G' (gold). The value of this column is determined by a customer's order history. \item About 10\% of customers are inactive, about 50\% have bronze status, about 35\% have silver status and about 5\% have gold status. \item The average row size is about 200 bytes and there are approximately 5,000,000 rows. \item The \textsf{Customer} table is heavily queried (averaging thirty new queries per minute) by four different groups of employees. Group~1 queries only inactive customers, group~2 queries only bronze status customers and groups~3 and~4 both query only silver and gold status customers. Members of any group may query the table at any time and queries often overlap. \item UPDATE, INSERT and DELETE operations on the table are uncommon. \item Very few queries are for individual customers; rather, most queries are based on various combinations of values from \textsf{status}, \textsf{city}, \textsf{state} and \textsf{country}. \end{itemize} \medskip %---------------------------------------------------------------------- \begin{subquestion}{3} Describe and justify an \emph{index-based} physical tuning solution that will improve the performance of queries on the \textsf{Customer} table. (No code is required.) \begin{marking} Since queries tend to be on combinations of several low-cardinality columns [1] and the table is very large [1], the most effective solution would be to place bitmap indexes on status, city, state and country [1]. B-tree indexes on any of the individual columns would not be ideal because of their low cardinality. A possible alternative is a composite B-tree index on all four columns [1 for suggesting this if appropriate], but this is not ideal and would be much larger than the four bitmap indexes [1 for this explanation if appropriate]. Hashing is definitely \emph{not} an option, because very few queries are exact match. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{7} \begin{subsubquestion}{5} \label{nonindex} Assuming that there are two identical fast disks available, describe and justify two alternative \emph{non index-based} physical tuning solutions that will improve the performance of queries on the \textsf{Customer} table. For each solution include details of how the two disks will be used. (No code is required.) \begin{marking} Alternative 1: partition the table across the two disks based on status (four partitions, one for each status value) [1]. This will allow simultaneous parallel access to different parts of the table by the different groups [1]. Since two groups both access silver and gold customers, it makes sense to put the silver customers on disk one and the gold customers on disk two. Bronze customers account for 50\% of the table, so it makes sense to also put them on disk two (as the gold customers are the smallest group), and put the inactive customers on disk one with the silver customers. [1 for any sensible partitioning scheme] Alternative 2: replicate the table across both disks (one replica on each disk) [1]. This will also provide simultaneous parallel access to the data (hard to tell whether this would be better or worse than the partitioning above) [1]. Note that any solution must be capable of being implemented in Oracle8. \end{marking} \end{subsubquestion} \begin{subsubquestion}{2} Briefly discuss the relative advantages and/or disadvantages of the two alternative solutions you described in part~\ref{nonindex} above. \begin{marking} The partitioning solution given above doesn't really help with the bronze customer rows---we would need random horizontal partitioning to really help with this, which Oracle8 doesn't support. More generally, Oracle8 does not migrate rows into the correct partition if a customer's status changes, so the data will need to be reloaded occasionally [1]. For replication, we have the usual problem of keeping the replicas synchronised if they are both read/write. This is somewhat mitigated by the fact that there are only two copies [1]. \end{marking} \end{subsubquestion} \end{subquestion} %---------------------------------------------------------------------- \end{question} \newpage %====================================================================== \begin{question}{10} There are two generic parameters that affect the structure of a B-tree index. %---------------------------------------------------------------------- \begin{subquestion}{6} Identify these two parameters and describe what each parameter specifies. Use diagrams to illustrate your answer. \begin{marking} \emph{Node size} [1] specifies the maximum number of pointers in each index node [1]. Suitable diagram [1]. \emph{Percentage fill} [1] specifies the number of pointers in each index node that are allocated when a node is created [1]. In effect, it determines the amount of free space left in each node. Suitable diagram [1]. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{4} For each of the two parameters, discuss the practical effect(s) of: \begin{subsubquestion}{2} high values of the parameter; and \begin{marking} Large node size reduces the height of the B-tree, resulting in potentially fewer I/Os to access a leaf node [$\frac{1}{2}$]. It can however increase the number of rows locked in a single leaf node [$\frac{1}{2}$]. High percent fill means that most pointers in a node will be allocated when the node is created [$\frac{1}{2}$]. This leaves little free space in the B-tree for new key values or key values that change [$\frac{1}{2}$]. \end{marking} \end{subsubquestion} \begin{subsubquestion}{2} low values of the parameter. \begin{marking} Small node size increases the height of the B-tree, resulting in potentially more I/Os to access a leaf node [$\frac{1}{2}$]. The number of rows locked in a single leaf node will be reduced [$\frac{1}{2}$]. Low percent fill means that most pointers in a node will not be allocated when the node is created [$\frac{1}{2}$]. This leaves little free space in the B-tree for new key values or key values that change [$\frac{1}{2}$]. \end{marking} \end{subsubquestion} \end{subquestion} %---------------------------------------------------------------------- \end{question} %====================================================================== \begin{question}{5}\label{clusterq} %---------------------------------------------------------------------- \begin{subquestion}{2} Define what is meant by the term ``clustering'' in the context of physical database design and tuning, and explain why it is beneficial. \begin{marking} Clustering is the grouping of related rows physically near to each other on disk [1] so that they may be retrieved in as few I/Os as possible (ideally one)---beneficial because of the slowness of disk I/O compared to memory I/O [1]. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{3}\label{clusterqcode} Consider the following Oracle8 SQL table definition: \begin{lstlisting}[language={[Oracle8]SQL}]{} create table enrolment ( student_id char(7), paper_code char(7), enrol_year number(4), primary key (student_id, paper_code, enrol_year) ); \end{lstlisting} The \textsf{Enrolment} table is often queried for all rows relating to a particular student in a particular year (for example, student 1234567 in 1998, student 9876543 in 2000, etc.). Write appropriate Oracle8 SQL code that will cluster the rows of the table in such a way as to support these types of query (you do not need to calculate cluster sizes). Relevant Oracle8 syntax diagrams are given on the next page. \begin{marking} \begin{lstlisting}[language={[Oracle8]SQL}]{} create cluster enrolment_cluster (student_id char(7), year number(4)); create index enrol_cluster_index on cluster enrolment_cluster; create table enrolment ( student_id char(7), paper_code char(7), enrol_year number(4), primary key (student_id, paper_code, enrol_year) ) cluster enrolment_cluster(student_id, enrol_year); \end{lstlisting} \end{marking} [1] each for correct cluster, cluster index and table clustering. \end{subquestion} \end{question} \end{examsection} \newpage % CREATE CLUSTER (first part), p 4-207 \begin{center} \textbf{Oracle8 syntax diagrams for use in question~\ref{clusterq}(\ref{clusterqcode})} \bigskip \includegraphics{CreateCluster} \vfill % CREATE INDEX (parts 1 and 2), p. 4-238 \includegraphics{CreateIndex} % CREATE TABLE (parts 1 and 2), p. 4-308 \vfill \includegraphics{CreateTable1} \bigskip \includegraphics{CreateTable2} \end{center} %====================================================================== % OBJECT DBMS SECTION %====================================================================== \begin{examsection}{25}{}{Questions in this section (total 25 marks) relate to object database management systems and the Object Data Management Group's object model (ODMG 3.0).} %====================================================================== \begin{question}{10}\label{odlq} Examine the following Object Definition Language (ODL) schema, then answer the questions below. \begin{lstlisting}[language=ODL]{} module ExamODL { interface Person { attribute string name; attribute date birth_date; attribute struct Address { string street; string city; string country; } home_address; }; class Employee : Person (extent Staff key emp_ID) { attribute string emp_ID; attribute string position; attribute string department; attribute unsigned long salary; }; }; \end{lstlisting} %---------------------------------------------------------------------- \begin{subquestion}{2} How many attributes will instances of the \textsf{Employee} class have? Explain your answer. \begin{marking} This depends on your interpretation of what constitutes an attribute. I prefer seven [1]. It has four of its own and three inherited from the \textsf{Person} class (\textsf{home\_address} being counted as a single embedded literal attribute) [1]. Nine is a possible alternative answer [1], because you could count \textsf{home\_address} as three attributes, giving four local and five inherited attributes [1]. No marks for any other number \emph{unless} it has a sensible explanation, in which case award marks as appropriate. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{3} Examine the clause ``\lstinline[language={[Oracle8]SQL}]+(extent Staff key emp_ID)+'' on line 14 of the ODL listing above. \begin{subsubquestion}{2} Explain the effect(s) of this clause. \begin{marking} \lstinline[language={[Oracle8]SQL}]+extent Staff+ defines the name of the \emph{extent} (collection of all instances) of the \textsf{Employee} class [1]. \lstinline[language={[Oracle8]SQL}]+key emp_ID+ defines \textsf{emp\_ID} to be an \emph{object key} for the \textsf{Employee} class [1]. \end{marking} \end{subsubquestion} \begin{subsubquestion}{1} Identify the relational equivalents of the items specified in this clause. \begin{marking} Extent = relation [$\frac{1}{2}$] and object key = primary key [$\frac{1}{2}$]. \end{marking} \end{subsubquestion} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{5}\label{oracleodl} Write \emph{all} the code required to implement an Oracle8 object type that is as similar as possible to the \textsf{Employee} class. The syntax diagram for the \textsf{\textbf{create type}} statement is given on the next page. \begin{marking} \begin{lstlisting}[language={[Oracle8]SQL}]{} create type address_type as object ( street varchar2(50), -- the data types aren't that important city varchar2(30), -- as long as they roughly match country varchar2(30) ); create type employee_type as object ( emp_id char(7), name varchar2(50), birth_date date, home_address address_type, position char(10), department char(20), salary number ); \end{lstlisting} [--1] for major errors, or for not including \textsf{address\_type} as a separate object type. [--$\frac{1}{2}$] for each attribute missing or in the wrong place. \end{marking} \end{subquestion} \end{question} \newpage \begin{center} \textbf{Oracle8 syntax diagram for use in question~\ref{odlq}(\ref{oracleodl})} \bigskip \includegraphics{CreateObjectType} \bigskip\bigskip \end{center} %====================================================================== \begin{question}{15} %---------------------------------------------------------------------- \begin{subquestion}{4} Compare in detail the ODMG concept of an \emph{object name} with the relational concept of a \emph{primary key}. \begin{marking} An object name is a meaningful user-generated identifier [$\frac{1}{2}$] for a single, specific object instance [$\frac{1}{2}$]. It is not part of the object's state [$\frac{1}{2}$] and a particular instance can have multiple object names associated with it [$\frac{1}{2}$]. In comparison, a primary key is also user-defined (but may not always be meaningful) [$\frac{1}{2}$]. While a primary key uniquely identifies individual rows (which are the equivalents of object instances), it is associated with \emph{all} rows in a table, not just a single row [$\frac{1}{2}$]. A primary key comprises attributes of the relation, so it is part of the relation's ``state'' [$\frac{1}{2}$]. An individual row has only one primary key associated with it [$\frac{1}{2}$]. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{4} Compare in detail the ODMG concept of a \emph{relationship} with the relational concept of a \emph{foreign key}, with particular emphasis on any differences between the two concepts. Use an appropriate diagram or diagrams to illustrate your answer. \begin{marking} An ODMG relationship is a binary association between two object type, comprising a pair of \emph{traversal paths}, which are effectively pointers linking the two related instances [1]. ODMG relationships may be one-to-one, one-to-many or many-to-many, but we cannot specify participation [1]. A relational foreign key is also a binary association between two relations, comprising an attribute stored in one of the relations that refers to the primary key of the other relation; in this respect, it is really equivalent to a traversal path rather than a full ODMG relationship [1]. Relationships defined by foreign keys cannot be many-to-many, but we can partially specify participation [1]. Useful diagram(s). \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{6} Define in detail the ODMG concepts of \emph{interface}, \emph{class} and \emph{literal}. \begin{marking} An interface is an ODMG object type that defines behaviour only [1]. You cannot create instances of interfaces (cf. abstract classes) [1]. A class is an ODMG object type that can define both state and behaviour [1]. You can create instances of classes [1]. A literal is an ODMG type (non-object) that defines state only [1]. They have no identity and are effectively just structured data types [1]. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{1} Date and Darwen claim in their \emph{Third Manifesto} that the relational model can support objects if it is implemented correctly. Briefly explain how they claim this can be achieved. \begin{marking} If we implement domains correctly/fully then they are effectively the equivalent of an object class [1]. \end{marking} \end{subquestion} \end{question} \end{examsection} %====================================================================== % DISTRIBUTED DBMS SECTION %====================================================================== \begin{examsection}{25}{}{Questions in this section (total 25 marks) relate to distributed database systems.} %====================================================================== \begin{question}{4} Describe what is meant by the terms ``distributed data'' and ``distributed database'' and explain the key difference(s) between the two. Use an example to illustrate the difference(s). \begin{marking} Distributed data are spread across multiple nodes on a network but are neither centrally managed nor highly integrated [1]. A distributed database combines distributed data with distributed database processing [1] (including management, access and transaction processing)---this is the key difference [1]. Reasonable example [1]. \end{marking} \end{question} %====================================================================== \begin{question}{10} Date's ``objective zero'' for distributed database management systems states ``a distributed database looks and behaves like a non-distributed database to any user''. Describe two \emph{different} ways in which this objective may be achieved, and for each discuss in detail how well it is implemented in Oracle8. \begin{marking} Basically a detailed description of any two of Date's objectives. Example: Location transparency is the ability to query data without needing to know on which network node the data are physically located. It is implemented reasonably well in Oracle8 by way of synonyms and views, after some initial setup work. [2 marks for description, 3 marks for implementation] \end{marking} \end{question} %====================================================================== \begin{question}{5} A company currently has a distributed database spread across two sites. It is necessary to move some data from one site to the other. %---------------------------------------------------------------------- \begin{subquestion}{3} Describe the general process that allows us to complete distributed transactions such as that described above with reasonable safety. \begin{marking} This is a distributed transaction, so the process involved is two-phase commit. In the first (\emph{prepare}) phase, the transaction coordinator tells all sites involved in the transaction to prepare to commit (by flushing buffers, locking data, etc.). Each site then responds ``OK'' or ``abort''. In the second (\emph{commit}) phase, if all sites respond ``OK'' the coordinator tells all sites to commit, otherwise it tells all sites to roll back. [1] for identifying 2PC, [1] each for both phases. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{2} Assuming an Oracle8 distributed database, give an example of a situation in which it may not be possible to automatically recover from a failed distributed transaction, and explain how such a situation might be dealt with. \begin{marking} If a site fails for a short period, in-doubt transactions will automatically complete when the site comes back up. If however the outage is for a significant period of time, then the DBMS cannot automatically commit in-doubt transactions as the data context may have changed considerably in the interim [1]. The DBA can commit or roll back in-doubt transactions manually, but this is risky [1]. \end{marking} \end{subquestion} \end{question} %====================================================================== \begin{question}{6} %---------------------------------------------------------------------- \begin{subquestion}{2} Consider the following Oracle8 SQL statement: \begin{lstlisting}[language={[Oracle8]SQL},numbers=none]{} create synonym exam_synonym for sales.customer@db_chch.world; \end{lstlisting} Explain the purpose of this statement. \begin{marking} It creates (in the local schema) an ``alias'' to the \textsf{customer} table in the \textsf{sales} schema of the \textsf{db\_chch.world} database [1]. This synonym can be queried as if it were a local copy of the table (i.e., it provides location transparency) [1]. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{4} Consider the following Oracle8 SQL statement: \begin{lstlisting}[language={[Oracle8]SQL}]{} create snapshot exam_snapshot refresh fast start with sysdate next sysdate + 1/12 as select * from sales.customer@db_chch.world; \end{lstlisting} Explain in detail what this statement does. \begin{marking} It creates a read only replica [1] of the \textsf{customer} table from the \textsf{sales} schema in the \textsf{db\_chch.world} database [1]. Changes to the original table only [1] are copied across every two hours starting now [1]. \end{marking} \end{subquestion} \end{question} %====================================================================== \end{examsection} %====================================================================== % DECISION SUPPORT SYSTEMS SECTION %====================================================================== \begin{examsection}{25}{}{Questions in this section (total 25 marks) relate to data warehouses and online analytical processing (OLAP) tools.} %====================================================================== \begin{question}{8} Compare in detail the characteristics of operational data and decision support data with respect to the following four areas: \begin{itemize} \item the types of decisions they are used to support; \item their timespan; \item their granularity; and \item their dimensionality. \end{itemize} \begin{marking} (Note: ``in detail'' implies more than just a table of brief points---only award half marks for such answers.) \begin{itemize} \item \textbf{Decision type:} operational data support short term day-to-day decision making [1] whereas decision support data support tactical and strategic decision making (longer term) [1]. \item \textbf{Timespan:} operational data tend to focus on \emph{now} (current transactions over maybe a few days) [1] whereas DS data range over much larger time scales (months, years, decades), both backwards and forwards [1]. \item \textbf{Granularity:} operational data are generally very fine-grained (individual transactions) [1], whereas DS data are aggregated at varying levels of summarisation [1]. \item \textbf{Dimensionality:} operational data tend to be ``flat'' (only one variable of interest associated with a given data value) [1], whereas DS data can have many dimensions (variables of interest) associated with a data value [1]. \end{itemize} \end{marking} \end{question} %====================================================================== \begin{question}{6} Discuss in detail the key differences between databases designed for online transaction processing and databases designed for data warehousing, in terms of both: \begin{subquestion}{2} the pattern of data use; and \begin{marking} OLTP databases are typically not queried heavily and the queries tend to be relatively simple [$\frac{1}{2}$]; they also have a high rate of online concurrent updates [$\frac{1}{2}$]. Conversely, data warehouses typically have very high query rates and the queries are very complex [$\frac{1}{2}$]; there are usually few updates (often read-only) and those updates that do occur are often carried out as batch operations [$\frac{1}{2}$]. ([$\frac{1}{2}$] for saying that OLTP typically have many concurrent users, or that data warehouses typically have few users.) \end{marking} \end{subquestion} \begin{subquestion}{4} physical design and tuning strategy. \begin{marking} OLTP databases are queried and updated by many users simultaneously, so we need to trade off between optimising for concurrency, updates or query performance [1]. The usual tuning techniques may be applied, but we should be particularly careful with the use of indexes (typically B-tree) because of the relatively high update rates [1]. ([$\frac{1}{2}$] for saying that the database is often relatively small.) In a data warehouse, updates are few but queries are many and complex, so we need to optimise aggessively for query performance [1]. The usual tuning techniques may be applied, but in particular we can heavily index the data (bitmap indexes and index-organised tables really come into their own here) [1].([$\frac{1}{2}$] for saying that the database is typically very large.) \end{marking} \end{subquestion} \end{question} %====================================================================== \begin{question}{4} Describe in detail the steps involved in the process of importing data into a data warehouse from external sources. \begin{marking} \begin{itemize} \item Extraction: obtaining data from various sources (electronically or otherwise) [1]. The data are often loaded into a special \emph{staging database} for further processing [1]. \item Transformation: data are cleaned up and manipulated into standard formats, so that data may be queried consistently regardless of source [1]. \item Loading: the transformed data are loaded into the data warehouse by some form of bulk loading tool [1]. \end{itemize} \end{marking} \end{question} %====================================================================== \begin{question}{1} Explain what is meant by the term ``data mart''. \begin{marking} A data mart is a small \emph{specialised} data warehouse, often a subset of a larger enterprise-wide data warehouse [1]. \end{marking} \end{question} %====================================================================== \begin{question}{6} Describe in detail two ways in which an OLAP tool can be used to change the way in which data are presented. Use diagrams and/or examples to illustrate your answer. \begin{marking} Any two of: \begin{itemize} \item Drill down/roll up allows us to move from more summarised data to more specific data and vice versa. \item ``Slice and dice'' allows us to reduce our view of a multidimensional data set to specific values of one or more dimensions. \item Pivoting allows us to change the layout of the data being output to give a different perspective on the same data set, e.g., viewing sales of products by region vs. sales in a region by product. \end{itemize} [1] each for technique, [1] each for description, [1] each for useful diagrams or examples. Any of the following are worth half marks: \begin{itemize} \item graphs/visualisation tools \item highlighting exceptions \end{itemize} \end{marking} \end{question} \end{examsection} \end{document}

Ignore Space Show notes View

content.tex 0 → 100644

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Note: No document class --- this is handled by the appropriate
% "driver" file, which makes it easier to switch class options
% such as markingschedule. Doing it this way means that you don't
% wipe out the original exam paper when you want to generate a
% marking schedule.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\usepackage{graphicx}
\usepackage{listings}

\examyear{2001}
\department{Information Science}
\papernumber{INFO 321}
\papertitle{Database Systems}
\instructions{Answer \underline{ALL} questions.}
\allowcalculators[approved]

\begin{document}

\lstloadlanguages{ODL,OQL,[Oracle8]SQL,[Oracle8]PLSQL}
\lstset%
{	basicstyle=\sffamily,
	showstringspaces=false,
	tabsize=2,
	xleftmargin=1cm,
	belowskip=0pt,
	commentstyle=\itshape,
	numbers=left,
	numberstyle=\scriptsize,
	numbersep=5pt,
	columns=fullflexible,
}

\maketitlepage

%======================================================================
% PHYSICAL TUNING SECTION
%======================================================================
\begin{examsection}{25}{}{Questions in this section (total 25 marks) relate to
physical database design and tuning.}

%======================================================================

\begin{question}{10}

A major U.S. mail order firm is having performance problems with queries on
	the \textsf{Customer} table in its Oracle8 order-processing database.
	Consider the following information about the \textsf{Customer} table, then
	answer the questions below:	\\

\textsf{\textbf{Customer}(\underline{customer\_no}, name, address, city,
	state, country, phone, email, status)}

\begin{itemize}

\item \textsf{customer\_no} is the primary key and because this is an
		Oracle8 database, a unique B-tree index has been automatically created
		on this column.

\item \textsf{status} is a single character column that holds one of
		the values `I' (inactive), `B' (bronze), `S' (silver) or `G' (gold).
		The value of this column is determined by a customer's order history.

\item About 10\% of customers are inactive, about 50\% have bronze
		status, about 35\% have silver status and about 5\% have gold status.

\item The average row size is about 200 bytes and there are
		approximately 5,000,000 rows.

\item The \textsf{Customer} table is heavily queried (averaging thirty
		new queries per minute) by four different groups of employees. Group~1
		queries only inactive customers, group~2 queries only bronze status
		customers and groups~3 and~4 both query only silver and gold status
		customers. Members of any group may query the table at any time and
		queries often overlap.

\item UPDATE, INSERT and DELETE operations on the table are uncommon.

\item Very few queries are for individual customers; rather, most
		queries are based on various combinations of values from
		\textsf{status}, \textsf{city}, \textsf{state} and \textsf{country}.

\end{itemize}
	\medskip

%----------------------------------------------------------------------

\begin{subquestion}{3}
		Describe and justify an \emph{index-based} physical tuning solution
		that will improve the performance of queries on the \textsf{Customer}
		table. (No code is required.)
		\begin{marking}
			Since queries tend to be on combinations of several low-cardinality
			columns [1] and the table is very large [1], the most effective
			solution would be to place bitmap indexes on status, city, state
			and country [1].

B-tree indexes on any of the individual columns would not be ideal
			because of their low cardinality. A possible alternative is a
			composite B-tree index on all four columns [1 for suggesting this
			if appropriate], but this is not ideal and would be much larger
			than the four bitmap indexes [1 for this explanation if
			appropriate].

Hashing is definitely \emph{not} an option, because very few
			queries are exact match.
		\end{marking}
	\end{subquestion}
		
%----------------------------------------------------------------------

\begin{subquestion}{7}
		\begin{subsubquestion}{5}
			\label{nonindex}
			Assuming that there are two identical fast disks available,
			describe and justify two alternative \emph{non index-based}
			physical tuning solutions that will improve the performance of
			queries on the \textsf{Customer} table. For each solution include
			details of how the two disks will be used. (No code is required.)
			\begin{marking}
				Alternative 1: partition the table across the two disks based
				on status (four partitions, one for each status value) [1].
				This will allow simultaneous parallel access to different parts
				of the table by the different groups [1]. Since two groups both
				access silver and gold customers, it makes sense to put the
				silver customers on disk one and the gold customers on disk
				two. Bronze customers account for 50\% of the table, so it
				makes sense to also put them on disk two (as the gold customers
				are the smallest group), and put the inactive customers on disk
				one with the silver customers. [1 for any sensible partitioning
				scheme]
	
				Alternative 2: replicate the table across both disks (one
				replica on each disk) [1]. This will also provide simultaneous
				parallel access to the data (hard to tell whether this would be
				better or worse than the partitioning above) [1].
	
				Note that any solution must be capable of being implemented in
				Oracle8.
			\end{marking}
		\end{subsubquestion}

\begin{subsubquestion}{2}
			Briefly discuss the relative advantages and/or disadvantages of the
			two alternative solutions you described in part~\ref{nonindex}
			above.
			\begin{marking}
				The partitioning solution given above doesn't really help with
				the bronze customer rows---we would need random horizontal
				partitioning to really help with this, which Oracle8 doesn't
				support. More generally, Oracle8 does not migrate rows into the
				correct partition if a customer's status changes, so the data
				will need to be reloaded occasionally [1].

For replication, we have the usual problem of keeping the
				replicas synchronised if they are both read/write. This is
				somewhat mitigated by the fact that there are only two copies
				[1].
			\end{marking}
		\end{subsubquestion}
	\end{subquestion}
	
%----------------------------------------------------------------------

\end{question}

\newpage

%======================================================================

\begin{question}{10}
	There are two generic parameters that affect the structure of a B-tree
	index.

%----------------------------------------------------------------------

\begin{subquestion}{6}
		Identify these two parameters and describe what each parameter
		specifies. Use diagrams to illustrate your answer.
		\begin{marking}
			\emph{Node size} [1] specifies the maximum number of pointers in
			each index node [1]. Suitable diagram [1].

\emph{Percentage fill} [1] specifies the number of pointers in each
			index node that are allocated when a node is created [1]. In
			effect, it determines the amount of free space left in each node.
			Suitable diagram [1].
		\end{marking}
	\end{subquestion}
	
%----------------------------------------------------------------------

\begin{subquestion}{4}
		For each of the two parameters, discuss the practical effect(s) of:
		\begin{subsubquestion}{2}
			high values of the parameter; and
			\begin{marking}
				Large node size reduces the height of the B-tree, resulting in
				potentially fewer I/Os to access a leaf node [$\frac{1}{2}$].
				It can however increase the number of rows locked in a single
				leaf node [$\frac{1}{2}$].
	
				High percent fill means that most pointers in a node will be
				allocated when the node is created [$\frac{1}{2}$]. This leaves
				little free space in the B-tree for new key values or key
				values that change [$\frac{1}{2}$].
			\end{marking}
		\end{subsubquestion}
		\begin{subsubquestion}{2}
			low values of the parameter.
			\begin{marking}
				Small node size increases the height of the B-tree, resulting
				in potentially more I/Os to access a leaf node [$\frac{1}{2}$].
				The number of rows locked in a single leaf node will be reduced
				[$\frac{1}{2}$].
	
				Low percent fill means that most pointers in a node will not be
				allocated when the node is created [$\frac{1}{2}$]. This leaves
				little free space in the B-tree for new key values or key
				values that change [$\frac{1}{2}$].
			\end{marking}
		\end{subsubquestion}
	\end{subquestion}
	
%----------------------------------------------------------------------

\end{question}
	
%======================================================================

\begin{question}{5}\label{clusterq}

%----------------------------------------------------------------------

\begin{subquestion}{2}
		Define what is meant by the term ``clustering'' in the context of
		physical database design and tuning, and explain why it is beneficial.
		\begin{marking}
			Clustering is the grouping of related rows physically near to each
			other on disk [1] so that they may be retrieved in as few I/Os as
			possible (ideally one)---beneficial because of the slowness of disk
			I/O compared to memory I/O [1].
		\end{marking}
	\end{subquestion}
	
%----------------------------------------------------------------------

\begin{subquestion}{3}\label{clusterqcode}
		Consider the following Oracle8 SQL table definition:
		
		\begin{lstlisting}[language={[Oracle8]SQL}]{}
create table enrolment
(	student_id char(7),
	paper_code char(7),
	enrol_year number(4),
  
	primary key (student_id, paper_code, enrol_year)
);
		\end{lstlisting}

The \textsf{Enrolment} table is often queried for all rows relating to
		a particular student in a particular year (for example, student 1234567
		in 1998, student 9876543 in 2000, etc.). Write appropriate Oracle8 SQL
		code that will cluster the rows of the table in such a way as to
		support these types of query (you do not need to calculate cluster
		sizes). Relevant Oracle8 syntax diagrams are given on the next page.
		\begin{marking}
			\begin{lstlisting}[language={[Oracle8]SQL}]{}
create cluster enrolment_cluster (student_id char(7), year number(4));

create index enrol_cluster_index on cluster enrolment_cluster;

create table enrolment
(	student_id char(7),
	paper_code char(7),
	enrol_year number(4),
  
	primary key (student_id, paper_code, enrol_year)
) cluster enrolment_cluster(student_id, enrol_year);
			\end{lstlisting}
		\end{marking}
		[1] each for correct cluster, cluster index and table clustering.
	\end{subquestion}

\end{question}

\end{examsection}

\newpage

% CREATE CLUSTER (first part), p 4-207

\begin{center}
	\textbf{Oracle8 syntax diagrams for use in
	question~\ref{clusterq}(\ref{clusterqcode})}
	
	\bigskip

\includegraphics{CreateCluster}

\vfill

% CREATE INDEX (parts 1 and 2), p. 4-238

\includegraphics{CreateIndex}

% CREATE TABLE (parts 1 and 2), p. 4-308

\vfill

\includegraphics{CreateTable1}
	
	\bigskip	
	
	\includegraphics{CreateTable2}
\end{center}

%======================================================================
% OBJECT DBMS SECTION
%======================================================================
\begin{examsection}{25}{}{Questions in this section (total 25 marks) relate to object
database management systems and the Object Data Management Group's object model
(ODMG 3.0).}

%======================================================================

\begin{question}{10}\label{odlq}
	Examine the following Object Definition Language (ODL) schema, then answer
	the questions below.

\begin{lstlisting}[language=ODL]{}
module ExamODL
{
	interface Person
	{
		attribute string name;
		attribute date birth_date;
		attribute struct Address
		{	string street;
			string city;
			string country;
		} home_address;
	};

class Employee : Person (extent Staff key emp_ID)
	{
		attribute string emp_ID;
		attribute string position;
		attribute string department;
		attribute unsigned long salary;
	};
};
	\end{lstlisting}
	
%----------------------------------------------------------------------

\begin{subquestion}{2}
		How many attributes will instances of the \textsf{Employee} class have?
		Explain your answer.
		\begin{marking}
			This depends on your interpretation of what constitutes an
			attribute. I prefer seven [1]. It has four of its own and three
			inherited from the \textsf{Person} class (\textsf{home\_address}
			being counted as a single embedded literal attribute) [1].

Nine is a possible alternative answer [1], because you could count
			\textsf{home\_address} as three attributes, giving four local and
			five inherited attributes [1].

No marks for any other number \emph{unless} it has a sensible
			explanation, in which case award marks as appropriate.
		\end{marking}
	\end{subquestion}
	
%----------------------------------------------------------------------

\begin{subquestion}{3}
		Examine the clause
		``\lstinline[language={[Oracle8]SQL}]+(extent Staff key emp_ID)+''
		on line 14 of the ODL listing above.

\begin{subsubquestion}{2}
			Explain the effect(s) of this clause.
			\begin{marking}
				\lstinline[language={[Oracle8]SQL}]+extent Staff+ defines the
				name of the \emph{extent} (collection of all instances) of the
				\textsf{Employee} class [1].

\lstinline[language={[Oracle8]SQL}]+key emp_ID+ defines
				\textsf{emp\_ID} to be an \emph{object key} for the
				\textsf{Employee} class [1].
			\end{marking}
		\end{subsubquestion}
		
		\begin{subsubquestion}{1}
			Identify the relational equivalents of the items specified in this
			clause.
			\begin{marking}
				Extent = relation [$\frac{1}{2}$] and object key = primary key
				[$\frac{1}{2}$].
			\end{marking}
		\end{subsubquestion}
	\end{subquestion}

%----------------------------------------------------------------------

\begin{subquestion}{5}\label{oracleodl}
		Write \emph{all} the code required to implement an Oracle8 object type
		that is as similar as possible to the \textsf{Employee} class. The
		syntax diagram for the \textsf{\textbf{create type}} statement is given
		on the next page.
		\begin{marking}
			\begin{lstlisting}[language={[Oracle8]SQL}]{}
create type address_type as object
(	street varchar2(50),	-- the data types aren't that important
	city varchar2(30),		-- as long as they roughly match
	country varchar2(30)
);

create type employee_type as object
(	emp_id char(7),
	name varchar2(50),
	birth_date date,
	home_address address_type,
	position char(10),
	department char(20),
	salary number
);
			\end{lstlisting}
			[--1] for major errors, or for not including \textsf{address\_type}
			as a separate object type. [--$\frac{1}{2}$] for each attribute
			missing or in the wrong place.
		\end{marking}
	\end{subquestion}
	
\end{question}

\newpage

\begin{center}
	\textbf{Oracle8 syntax diagram for use in
	question~\ref{odlq}(\ref{oracleodl})}
	
	\bigskip
	\includegraphics{CreateObjectType}
	\bigskip\bigskip
\end{center}

%======================================================================

\begin{question}{15}

%----------------------------------------------------------------------

\begin{subquestion}{4}
		Compare in detail the ODMG concept of an \emph{object name} with the
		relational concept of a \emph{primary key}.
		\begin{marking}
			An object name is a meaningful user-generated identifier
			[$\frac{1}{2}$] for a single, specific object instance
			[$\frac{1}{2}$]. It is not part of the object's state
			[$\frac{1}{2}$] and a particular instance can have multiple object
			names associated with it [$\frac{1}{2}$].

In comparison, a primary key is also user-defined (but may not
			always be meaningful) [$\frac{1}{2}$]. While a primary key uniquely
			identifies individual rows (which are the equivalents of object
			instances), it is associated with \emph{all} rows in a table, not
			just a single row [$\frac{1}{2}$]. A primary key comprises
			attributes of the relation, so it is part of the relation's
			``state'' [$\frac{1}{2}$]. An individual row has only one primary
			key associated with it [$\frac{1}{2}$].
		\end{marking}
	\end{subquestion}
	
%----------------------------------------------------------------------

\begin{subquestion}{4}
		Compare in detail the ODMG concept of a \emph{relationship} with the
		relational concept of a \emph{foreign key}, with particular emphasis on
		any differences between the two concepts. Use an appropriate diagram or
		diagrams to illustrate your answer.
		\begin{marking}
			An ODMG relationship is a binary association between two object
			type, comprising a pair of \emph{traversal paths}, which are
			effectively pointers linking the two related instances [1]. ODMG
			relationships may be one-to-one, one-to-many or many-to-many, but
			we cannot specify participation [1].

A relational foreign key is also a binary association between two
			relations, comprising an attribute stored in one of the relations
			that refers to the primary key of the other relation; in this
			respect, it is really equivalent to a traversal path rather than a
			full ODMG relationship [1]. Relationships defined by foreign keys
			cannot be many-to-many, but we can partially specify participation
			[1].

Useful diagram(s).
		\end{marking}
	\end{subquestion}
	
%----------------------------------------------------------------------

\begin{subquestion}{6}
		Define in detail the ODMG concepts of \emph{interface}, \emph{class}
		and \emph{literal}.
		\begin{marking}
			An interface is an ODMG object type that defines behaviour only
			[1]. You cannot create instances of interfaces (cf. abstract
			classes) [1].

A class is an ODMG object type that can define both state and
			behaviour [1]. You can create instances of classes [1].

A literal is an ODMG type (non-object) that defines state only [1].
			They have no identity and are effectively just structured data
			types [1].
		\end{marking}
	\end{subquestion}
	
%----------------------------------------------------------------------

\begin{subquestion}{1}
		Date and Darwen claim in their \emph{Third Manifesto} that the
		relational model can support objects if it is implemented correctly.
		Briefly explain how they claim this can be achieved.
		\begin{marking}
			If we implement domains correctly/fully then they are effectively
			the equivalent of an object class [1].
		\end{marking}
	\end{subquestion}

\end{question}

\end{examsection}

%======================================================================
% DISTRIBUTED DBMS SECTION
%======================================================================
\begin{examsection}{25}{}{Questions in this section (total 25 marks) relate to
distributed database systems.}

%======================================================================

\begin{question}{4}
	Describe what is meant by the terms ``distributed data'' and ``distributed
	database'' and explain the key difference(s) between the two. Use an
	example to illustrate the difference(s).
	\begin{marking}
		Distributed data are spread across multiple nodes on a network but are
		neither centrally managed nor highly integrated [1]. A distributed
		database combines distributed data with distributed database processing
		[1] (including management, access and transaction processing)---this is
		the key difference [1]. Reasonable example [1].
	\end{marking}
\end{question}

%======================================================================

\begin{question}{10}
	Date's ``objective zero'' for distributed database management systems
	states ``a distributed database looks and behaves like a non-distributed
	database to any user''. Describe two \emph{different} ways in which this
	objective may be achieved, and for each discuss in detail how well it is
	implemented in Oracle8.
	\begin{marking}
		Basically a detailed description of any two of Date's objectives.
		Example: Location transparency is the ability to query data without
		needing to know on which network node the data are physically located.
		It is implemented reasonably well in Oracle8 by way of synonyms and
		views, after some initial setup work. [2 marks for description, 3 marks
		for implementation]
	\end{marking}
\end{question}

%======================================================================

\begin{question}{5}
	A company currently has a distributed database spread across two sites. It
	is necessary to move some data from one site to the other.

%----------------------------------------------------------------------

\begin{subquestion}{3}
		Describe the general process that allows us to complete distributed
		transactions such as that described above with reasonable safety.
		\begin{marking}
			This is a distributed transaction, so the process involved is
			two-phase commit. In the first (\emph{prepare}) phase, the
			transaction coordinator tells all sites involved in the transaction
			to prepare to commit (by flushing buffers, locking data, etc.).
			Each site then responds ``OK'' or ``abort''. In the second
			(\emph{commit}) phase, if all sites respond ``OK'' the coordinator
			tells all sites to commit, otherwise it tells all sites to roll
			back.

[1] for identifying 2PC, [1] each for both phases.
		\end{marking}
	\end{subquestion}
	
%----------------------------------------------------------------------

\begin{subquestion}{2}
		Assuming an Oracle8 distributed database, give an example of a
		situation in which it may not be possible to automatically recover from
		a failed distributed transaction, and explain how such a situation
		might be dealt with.
		\begin{marking}
			If a site fails for a short period, in-doubt transactions will
			automatically complete when the site comes back up. If however the
			outage is for a significant period of time, then the DBMS cannot
			automatically commit in-doubt transactions as the data context may
			have changed considerably in the interim [1]. The DBA can commit or
			roll back in-doubt transactions manually, but this is risky [1].
		\end{marking} 
	\end{subquestion}

\end{question}

%======================================================================

\begin{question}{6}

%----------------------------------------------------------------------

\begin{subquestion}{2}
		Consider the following Oracle8 SQL statement:
		\begin{lstlisting}[language={[Oracle8]SQL},numbers=none]{}
create synonym exam_synonym for sales.customer@db_chch.world;
		\end{lstlisting}
		Explain the purpose of this statement.
		\begin{marking}
			It creates (in the local schema) an ``alias'' to the
			\textsf{customer} table in the \textsf{sales} schema of the
			\textsf{db\_chch.world} database [1]. This synonym can be queried
			as if it were a local copy of the table (i.e., it provides location
			transparency) [1].
		\end{marking}
	\end{subquestion}

%----------------------------------------------------------------------

\begin{subquestion}{4}
		Consider the following Oracle8 SQL statement:
		\begin{lstlisting}[language={[Oracle8]SQL}]{}
create snapshot exam_snapshot
	refresh fast
		start with sysdate
		next sysdate + 1/12
	as select * from sales.customer@db_chch.world;
		\end{lstlisting}
		Explain in detail what this statement does.
		\begin{marking}
			It creates a read only replica [1] of the \textsf{customer} table
			from the \textsf{sales} schema in the \textsf{db\_chch.world}
			database [1]. Changes to the original table only [1] are copied
			across every two hours starting now [1].
		\end{marking}
	\end{subquestion}

\end{question}
	
%======================================================================

\end{examsection}

%======================================================================
% DECISION SUPPORT SYSTEMS SECTION
%======================================================================
\begin{examsection}{25}{}{Questions in this section (total 25 marks) relate to data
warehouses and online analytical processing (OLAP) tools.}

%======================================================================

\begin{question}{8}
	Compare in detail the characteristics of operational data and decision
	support data with respect to the following four areas:
	\begin{itemize}
		\item the types of decisions they are used to support;
		\item their timespan;
		\item their granularity; and
		\item their dimensionality.
	\end{itemize}
	\begin{marking}
		(Note: ``in detail'' implies more than just a table of brief
		points---only award half marks for such answers.)
		\begin{itemize}

\item \textbf{Decision type:} operational data support short term
			day-to-day decision making [1] whereas decision support data
			support tactical and strategic decision making (longer term) [1].

\item \textbf{Timespan:} operational data tend to focus on
			\emph{now} (current transactions over maybe a few days) [1] whereas
			DS data range over much larger time scales (months, years,
			decades), both backwards and forwards [1].

\item \textbf{Granularity:} operational data are generally very
			fine-grained (individual transactions) [1], whereas DS data are
			aggregated at varying levels of summarisation [1].

\item \textbf{Dimensionality:} operational data tend to be ``flat''
			(only one variable of interest associated with a given data value)
			[1], whereas DS data can have many dimensions (variables of
			interest) associated with a data value [1].

\end{itemize}
	\end{marking}
\end{question}

%======================================================================

\begin{question}{6}
	Discuss in detail the key differences between databases designed for online
	transaction processing and databases designed for data warehousing, in
	terms of both:

\begin{subquestion}{2}
		the pattern of data use; and
		\begin{marking}
			OLTP databases are typically not queried heavily and the queries
			tend to be relatively simple [$\frac{1}{2}$]; they also have a high
			rate of online concurrent updates [$\frac{1}{2}$]. Conversely, data
			warehouses typically have very high query rates and the queries are
			very complex [$\frac{1}{2}$]; there are usually few updates (often
			read-only) and those updates that do occur are often carried out as
			batch operations [$\frac{1}{2}$]. ([$\frac{1}{2}$] for saying that
			OLTP typically have many concurrent users, or that data warehouses
			typically have few users.)
		\end{marking}
	\end{subquestion}

\begin{subquestion}{4}
		physical design and tuning strategy.
		\begin{marking}
			OLTP databases are queried and updated by many users
			simultaneously, so we need to trade off between optimising for
			concurrency, updates or query performance [1]. The usual tuning
			techniques may be applied, but we should be particularly careful
			with the use of indexes (typically B-tree) because of the
			relatively high update rates [1]. ([$\frac{1}{2}$] for saying that
			the database is often relatively small.)

In a data warehouse, updates are few but queries are many and
			complex, so we need to optimise aggessively for query performance
			[1]. The usual tuning techniques may be applied, but in particular
			we can heavily index the data (bitmap indexes and index-organised
			tables really come into their own here) [1].([$\frac{1}{2}$] for
			saying that the database is typically very large.)
		\end{marking}
	\end{subquestion}
\end{question}

%======================================================================

\begin{question}{4}
	Describe in detail the steps involved in the process of importing data into
	a data warehouse from external sources.
	\begin{marking}
		\begin{itemize}

\item Extraction: obtaining data from various sources
			(electronically or otherwise) [1]. The data are often loaded into a
			special \emph{staging database} for further processing [1].

\item Transformation: data are cleaned up and manipulated into
			standard formats, so that data may be queried consistently
			regardless of source [1].

\item Loading: the transformed data are loaded into the data
			warehouse by some form of bulk loading tool [1].

\end{itemize}
	\end{marking}
\end{question}

%======================================================================

\begin{question}{1}
	Explain what is meant by the term ``data mart''.
	\begin{marking}
		A data mart is a small \emph{specialised} data warehouse, often a
		subset of a larger enterprise-wide data warehouse [1].
	\end{marking}
\end{question}

%======================================================================

\begin{question}{6}
	Describe in detail two ways in which an OLAP tool can be used to change the
	way in which data are presented. Use diagrams and/or examples to illustrate
	your answer.
	\begin{marking}
		Any two of:
		\begin{itemize}

\item Drill down/roll up allows us to move from more summarised
			data to more specific data and vice versa.

\item ``Slice and dice'' allows us to reduce our view of a
			multidimensional data set to specific values of one or more
			dimensions.

\item Pivoting allows us to change the layout of the data being
			output to give a different perspective on the same data set, e.g.,
			viewing sales of products by region vs. sales in a region by
			product.

\end{itemize}
		[1] each for technique, [1] each for description, [1] each for useful
		diagrams or examples. Any of the following are worth half marks:
		\begin{itemize}
			\item graphs/visualisation tools
			\item highlighting exceptions
		\end{itemize}
	\end{marking}
\end{question}

\end{examsection}

\end{document}

Show notes View example.tex 100644 → 0
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Note: No document class --- this is handled by the appropriate % "driver" file, which makes it easier to switch class options % such as markingschedule. Doing it this way means that you don't % wipe out the original exam paper when you want to generate a % marking schedule. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \usepackage{graphicx} \usepackage{listings} \examyear{2001} \department{Information Science} \papernumber{INFO 321} \papertitle{Database Systems} \instructions{Answer \underline{ALL} questions.} \allowcalculators[approved] \begin{document} \lstloadlanguages{ODL,OQL,[Oracle8]SQL,[Oracle8]PLSQL} \lstset% { basicstyle=\sffamily, showstringspaces=false, tabsize=2, xleftmargin=1cm, belowskip=0pt, commentstyle=\itshape, numbers=left, numberstyle=\scriptsize, numbersep=5pt, columns=fullflexible, } \maketitlepage %====================================================================== % PHYSICAL TUNING SECTION %====================================================================== \begin{examsection}{25}{}{Questions in this section (total 25 marks) relate to physical database design and tuning.} %====================================================================== \begin{question}{10} A major U.S. mail order firm is having performance problems with queries on the \textsf{Customer} table in its Oracle8 order-processing database. Consider the following information about the \textsf{Customer} table, then answer the questions below: \\ \textsf{\textbf{Customer}(\underline{customer\_no}, name, address, city, state, country, phone, email, status)} \begin{itemize} \item \textsf{customer\_no} is the primary key and because this is an Oracle8 database, a unique B-tree index has been automatically created on this column. \item \textsf{status} is a single character column that holds one of the values `I' (inactive), `B' (bronze), `S' (silver) or `G' (gold). The value of this column is determined by a customer's order history. \item About 10\% of customers are inactive, about 50\% have bronze status, about 35\% have silver status and about 5\% have gold status. \item The average row size is about 200 bytes and there are approximately 5,000,000 rows. \item The \textsf{Customer} table is heavily queried (averaging thirty new queries per minute) by four different groups of employees. Group~1 queries only inactive customers, group~2 queries only bronze status customers and groups~3 and~4 both query only silver and gold status customers. Members of any group may query the table at any time and queries often overlap. \item UPDATE, INSERT and DELETE operations on the table are uncommon. \item Very few queries are for individual customers; rather, most queries are based on various combinations of values from \textsf{status}, \textsf{city}, \textsf{state} and \textsf{country}. \end{itemize} \medskip %---------------------------------------------------------------------- \begin{subquestion}{3} Describe and justify an \emph{index-based} physical tuning solution that will improve the performance of queries on the \textsf{Customer} table. (No code is required.) \begin{marking} Since queries tend to be on combinations of several low-cardinality columns [1] and the table is very large [1], the most effective solution would be to place bitmap indexes on status, city, state and country [1]. B-tree indexes on any of the individual columns would not be ideal because of their low cardinality. A possible alternative is a composite B-tree index on all four columns [1 for suggesting this if appropriate], but this is not ideal and would be much larger than the four bitmap indexes [1 for this explanation if appropriate]. Hashing is definitely \emph{not} an option, because very few queries are exact match. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{7} \begin{subsubquestion}{5} \label{nonindex} Assuming that there are two identical fast disks available, describe and justify two alternative \emph{non index-based} physical tuning solutions that will improve the performance of queries on the \textsf{Customer} table. For each solution include details of how the two disks will be used. (No code is required.) \begin{marking} Alternative 1: partition the table across the two disks based on status (four partitions, one for each status value) [1]. This will allow simultaneous parallel access to different parts of the table by the different groups [1]. Since two groups both access silver and gold customers, it makes sense to put the silver customers on disk one and the gold customers on disk two. Bronze customers account for 50\% of the table, so it makes sense to also put them on disk two (as the gold customers are the smallest group), and put the inactive customers on disk one with the silver customers. [1 for any sensible partitioning scheme] Alternative 2: replicate the table across both disks (one replica on each disk) [1]. This will also provide simultaneous parallel access to the data (hard to tell whether this would be better or worse than the partitioning above) [1]. Note that any solution must be capable of being implemented in Oracle8. \end{marking} \end{subsubquestion} \begin{subsubquestion}{2} Briefly discuss the relative advantages and/or disadvantages of the two alternative solutions you described in part~\ref{nonindex} above. \begin{marking} The partitioning solution given above doesn't really help with the bronze customer rows---we would need random horizontal partitioning to really help with this, which Oracle8 doesn't support. More generally, Oracle8 does not migrate rows into the correct partition if a customer's status changes, so the data will need to be reloaded occasionally [1]. For replication, we have the usual problem of keeping the replicas synchronised if they are both read/write. This is somewhat mitigated by the fact that there are only two copies [1]. \end{marking} \end{subsubquestion} \end{subquestion} %---------------------------------------------------------------------- \end{question} \newpage %====================================================================== \begin{question}{10} There are two generic parameters that affect the structure of a B-tree index. %---------------------------------------------------------------------- \begin{subquestion}{6} Identify these two parameters and describe what each parameter specifies. Use diagrams to illustrate your answer. \begin{marking} \emph{Node size} [1] specifies the maximum number of pointers in each index node [1]. Suitable diagram [1]. \emph{Percentage fill} [1] specifies the number of pointers in each index node that are allocated when a node is created [1]. In effect, it determines the amount of free space left in each node. Suitable diagram [1]. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{4} For each of the two parameters, discuss the practical effect(s) of: \begin{subsubquestion}{2} high values of the parameter; and \begin{marking} Large node size reduces the height of the B-tree, resulting in potentially fewer I/Os to access a leaf node [$\frac{1}{2}$]. It can however increase the number of rows locked in a single leaf node [$\frac{1}{2}$]. High percent fill means that most pointers in a node will be allocated when the node is created [$\frac{1}{2}$]. This leaves little free space in the B-tree for new key values or key values that change [$\frac{1}{2}$]. \end{marking} \end{subsubquestion} \begin{subsubquestion}{2} low values of the parameter. \begin{marking} Small node size increases the height of the B-tree, resulting in potentially more I/Os to access a leaf node [$\frac{1}{2}$]. The number of rows locked in a single leaf node will be reduced [$\frac{1}{2}$]. Low percent fill means that most pointers in a node will not be allocated when the node is created [$\frac{1}{2}$]. This leaves little free space in the B-tree for new key values or key values that change [$\frac{1}{2}$]. \end{marking} \end{subsubquestion} \end{subquestion} %---------------------------------------------------------------------- \end{question} %====================================================================== \begin{question}{5}\label{clusterq} %---------------------------------------------------------------------- \begin{subquestion}{2} Define what is meant by the term ``clustering'' in the context of physical database design and tuning, and explain why it is beneficial. \begin{marking} Clustering is the grouping of related rows physically near to each other on disk [1] so that they may be retrieved in as few I/Os as possible (ideally one)---beneficial because of the slowness of disk I/O compared to memory I/O [1]. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{3}\label{clusterqcode} Consider the following Oracle8 SQL table definition: \begin{lstlisting}[language={[Oracle8]SQL}]{} create table enrolment ( student_id char(7), paper_code char(7), enrol_year number(4), primary key (student_id, paper_code, enrol_year) ); \end{lstlisting} The \textsf{Enrolment} table is often queried for all rows relating to a particular student in a particular year (for example, student 1234567 in 1998, student 9876543 in 2000, etc.). Write appropriate Oracle8 SQL code that will cluster the rows of the table in such a way as to support these types of query (you do not need to calculate cluster sizes). Relevant Oracle8 syntax diagrams are given on the next page. \begin{marking} \begin{lstlisting}[language={[Oracle8]SQL}]{} create cluster enrolment_cluster (student_id char(7), year number(4)); create index enrol_cluster_index on cluster enrolment_cluster; create table enrolment ( student_id char(7), paper_code char(7), enrol_year number(4), primary key (student_id, paper_code, enrol_year) ) cluster enrolment_cluster(student_id, enrol_year); \end{lstlisting} \end{marking} [1] each for correct cluster, cluster index and table clustering. \end{subquestion} \end{question} \end{examsection} \newpage % CREATE CLUSTER (first part), p 4-207 \begin{center} \textbf{Oracle8 syntax diagrams for use in question~\ref{clusterq}(\ref{clusterqcode})} \bigskip \includegraphics{CreateCluster} \vfill % CREATE INDEX (parts 1 and 2), p. 4-238 \includegraphics{CreateIndex} % CREATE TABLE (parts 1 and 2), p. 4-308 \vfill \includegraphics{CreateTable1} \bigskip \includegraphics{CreateTable2} \end{center} %====================================================================== % OBJECT DBMS SECTION %====================================================================== \begin{examsection}{25}{}{Questions in this section (total 25 marks) relate to object database management systems and the Object Data Management Group's object model (ODMG 3.0).} %====================================================================== \begin{question}{10}\label{odlq} Examine the following Object Definition Language (ODL) schema, then answer the questions below. \begin{lstlisting}[language=ODL]{} module ExamODL { interface Person { attribute string name; attribute date birth_date; attribute struct Address { string street; string city; string country; } home_address; }; class Employee : Person (extent Staff key emp_ID) { attribute string emp_ID; attribute string position; attribute string department; attribute unsigned long salary; }; }; \end{lstlisting} %---------------------------------------------------------------------- \begin{subquestion}{2} How many attributes will instances of the \textsf{Employee} class have? Explain your answer. \begin{marking} This depends on your interpretation of what constitutes an attribute. I prefer seven [1]. It has four of its own and three inherited from the \textsf{Person} class (\textsf{home\_address} being counted as a single embedded literal attribute) [1]. Nine is a possible alternative answer [1], because you could count \textsf{home\_address} as three attributes, giving four local and five inherited attributes [1]. No marks for any other number \emph{unless} it has a sensible explanation, in which case award marks as appropriate. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{3} Examine the clause ``\lstinline[language={[Oracle8]SQL}]+(extent Staff key emp_ID)+'' on line 14 of the ODL listing above. \begin{subsubquestion}{2} Explain the effect(s) of this clause. \begin{marking} \lstinline[language={[Oracle8]SQL}]+extent Staff+ defines the name of the \emph{extent} (collection of all instances) of the \textsf{Employee} class [1]. \lstinline[language={[Oracle8]SQL}]+key emp_ID+ defines \textsf{emp\_ID} to be an \emph{object key} for the \textsf{Employee} class [1]. \end{marking} \end{subsubquestion} \begin{subsubquestion}{1} Identify the relational equivalents of the items specified in this clause. \begin{marking} Extent = relation [$\frac{1}{2}$] and object key = primary key [$\frac{1}{2}$]. \end{marking} \end{subsubquestion} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{5}\label{oracleodl} Write \emph{all} the code required to implement an Oracle8 object type that is as similar as possible to the \textsf{Employee} class. The syntax diagram for the \textsf{\textbf{create type}} statement is given on the next page. \begin{marking} \begin{lstlisting}[language={[Oracle8]SQL}]{} create type address_type as object ( street varchar2(50), -- the data types aren't that important city varchar2(30), -- as long as they roughly match country varchar2(30) ); create type employee_type as object ( emp_id char(7), name varchar2(50), birth_date date, home_address address_type, position char(10), department char(20), salary number ); \end{lstlisting} [--1] for major errors, or for not including \textsf{address\_type} as a separate object type. [--$\frac{1}{2}$] for each attribute missing or in the wrong place. \end{marking} \end{subquestion} \end{question} \newpage \begin{center} \textbf{Oracle8 syntax diagram for use in question~\ref{odlq}(\ref{oracleodl})} \bigskip \includegraphics{CreateObjectType} \bigskip\bigskip \end{center} %====================================================================== \begin{question}{15} %---------------------------------------------------------------------- \begin{subquestion}{4} Compare in detail the ODMG concept of an \emph{object name} with the relational concept of a \emph{primary key}. \begin{marking} An object name is a meaningful user-generated identifier [$\frac{1}{2}$] for a single, specific object instance [$\frac{1}{2}$]. It is not part of the object's state [$\frac{1}{2}$] and a particular instance can have multiple object names associated with it [$\frac{1}{2}$]. In comparison, a primary key is also user-defined (but may not always be meaningful) [$\frac{1}{2}$]. While a primary key uniquely identifies individual rows (which are the equivalents of object instances), it is associated with \emph{all} rows in a table, not just a single row [$\frac{1}{2}$]. A primary key comprises attributes of the relation, so it is part of the relation's ``state'' [$\frac{1}{2}$]. An individual row has only one primary key associated with it [$\frac{1}{2}$]. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{4} Compare in detail the ODMG concept of a \emph{relationship} with the relational concept of a \emph{foreign key}, with particular emphasis on any differences between the two concepts. Use an appropriate diagram or diagrams to illustrate your answer. \begin{marking} An ODMG relationship is a binary association between two object type, comprising a pair of \emph{traversal paths}, which are effectively pointers linking the two related instances [1]. ODMG relationships may be one-to-one, one-to-many or many-to-many, but we cannot specify participation [1]. A relational foreign key is also a binary association between two relations, comprising an attribute stored in one of the relations that refers to the primary key of the other relation; in this respect, it is really equivalent to a traversal path rather than a full ODMG relationship [1]. Relationships defined by foreign keys cannot be many-to-many, but we can partially specify participation [1]. Useful diagram(s). \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{6} Define in detail the ODMG concepts of \emph{interface}, \emph{class} and \emph{literal}. \begin{marking} An interface is an ODMG object type that defines behaviour only [1]. You cannot create instances of interfaces (cf. abstract classes) [1]. A class is an ODMG object type that can define both state and behaviour [1]. You can create instances of classes [1]. A literal is an ODMG type (non-object) that defines state only [1]. They have no identity and are effectively just structured data types [1]. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{1} Date and Darwen claim in their \emph{Third Manifesto} that the relational model can support objects if it is implemented correctly. Briefly explain how they claim this can be achieved. \begin{marking} If we implement domains correctly/fully then they are effectively the equivalent of an object class [1]. \end{marking} \end{subquestion} \end{question} \end{examsection} %====================================================================== % DISTRIBUTED DBMS SECTION %====================================================================== \begin{examsection}{25}{}{Questions in this section (total 25 marks) relate to distributed database systems.} %====================================================================== \begin{question}{4} Describe what is meant by the terms ``distributed data'' and ``distributed database'' and explain the key difference(s) between the two. Use an example to illustrate the difference(s). \begin{marking} Distributed data are spread across multiple nodes on a network but are neither centrally managed nor highly integrated [1]. A distributed database combines distributed data with distributed database processing [1] (including management, access and transaction processing)---this is the key difference [1]. Reasonable example [1]. \end{marking} \end{question} %====================================================================== \begin{question}{10} Date's ``objective zero'' for distributed database management systems states ``a distributed database looks and behaves like a non-distributed database to any user''. Describe two \emph{different} ways in which this objective may be achieved, and for each discuss in detail how well it is implemented in Oracle8. \begin{marking} Basically a detailed description of any two of Date's objectives. Example: Location transparency is the ability to query data without needing to know on which network node the data are physically located. It is implemented reasonably well in Oracle8 by way of synonyms and views, after some initial setup work. [2 marks for description, 3 marks for implementation] \end{marking} \end{question} %====================================================================== \begin{question}{5} A company currently has a distributed database spread across two sites. It is necessary to move some data from one site to the other. %---------------------------------------------------------------------- \begin{subquestion}{3} Describe the general process that allows us to complete distributed transactions such as that described above with reasonable safety. \begin{marking} This is a distributed transaction, so the process involved is two-phase commit. In the first (\emph{prepare}) phase, the transaction coordinator tells all sites involved in the transaction to prepare to commit (by flushing buffers, locking data, etc.). Each site then responds ``OK'' or ``abort''. In the second (\emph{commit}) phase, if all sites respond ``OK'' the coordinator tells all sites to commit, otherwise it tells all sites to roll back. [1] for identifying 2PC, [1] each for both phases. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{2} Assuming an Oracle8 distributed database, give an example of a situation in which it may not be possible to automatically recover from a failed distributed transaction, and explain how such a situation might be dealt with. \begin{marking} If a site fails for a short period, in-doubt transactions will automatically complete when the site comes back up. If however the outage is for a significant period of time, then the DBMS cannot automatically commit in-doubt transactions as the data context may have changed considerably in the interim [1]. The DBA can commit or roll back in-doubt transactions manually, but this is risky [1]. \end{marking} \end{subquestion} \end{question} %====================================================================== \begin{question}{6} %---------------------------------------------------------------------- \begin{subquestion}{2} Consider the following Oracle8 SQL statement: \begin{lstlisting}[language={[Oracle8]SQL},numbers=none]{} create synonym exam_synonym for sales.customer@db_chch.world; \end{lstlisting} Explain the purpose of this statement. \begin{marking} It creates (in the local schema) an ``alias'' to the \textsf{customer} table in the \textsf{sales} schema of the \textsf{db\_chch.world} database [1]. This synonym can be queried as if it were a local copy of the table (i.e., it provides location transparency) [1]. \end{marking} \end{subquestion} %---------------------------------------------------------------------- \begin{subquestion}{4} Consider the following Oracle8 SQL statement: \begin{lstlisting}[language={[Oracle8]SQL}]{} create snapshot exam_snapshot refresh fast start with sysdate next sysdate + 1/12 as select * from sales.customer@db_chch.world; \end{lstlisting} Explain in detail what this statement does. \begin{marking} It creates a read only replica [1] of the \textsf{customer} table from the \textsf{sales} schema in the \textsf{db\_chch.world} database [1]. Changes to the original table only [1] are copied across every two hours starting now [1]. \end{marking} \end{subquestion} \end{question} %====================================================================== \end{examsection} %====================================================================== % DECISION SUPPORT SYSTEMS SECTION %====================================================================== \begin{examsection}{25}{}{Questions in this section (total 25 marks) relate to data warehouses and online analytical processing (OLAP) tools.} %====================================================================== \begin{question}{8} Compare in detail the characteristics of operational data and decision support data with respect to the following four areas: \begin{itemize} \item the types of decisions they are used to support; \item their timespan; \item their granularity; and \item their dimensionality. \end{itemize} \begin{marking} (Note: ``in detail'' implies more than just a table of brief points---only award half marks for such answers.) \begin{itemize} \item \textbf{Decision type:} operational data support short term day-to-day decision making [1] whereas decision support data support tactical and strategic decision making (longer term) [1]. \item \textbf{Timespan:} operational data tend to focus on \emph{now} (current transactions over maybe a few days) [1] whereas DS data range over much larger time scales (months, years, decades), both backwards and forwards [1]. \item \textbf{Granularity:} operational data are generally very fine-grained (individual transactions) [1], whereas DS data are aggregated at varying levels of summarisation [1]. \item \textbf{Dimensionality:} operational data tend to be ``flat'' (only one variable of interest associated with a given data value) [1], whereas DS data can have many dimensions (variables of interest) associated with a data value [1]. \end{itemize} \end{marking} \end{question} %====================================================================== \begin{question}{6} Discuss in detail the key differences between databases designed for online transaction processing and databases designed for data warehousing, in terms of both: \begin{subquestion}{2} the pattern of data use; and \begin{marking} OLTP databases are typically not queried heavily and the queries tend to be relatively simple [$\frac{1}{2}$]; they also have a high rate of online concurrent updates [$\frac{1}{2}$]. Conversely, data warehouses typically have very high query rates and the queries are very complex [$\frac{1}{2}$]; there are usually few updates (often read-only) and those updates that do occur are often carried out as batch operations [$\frac{1}{2}$]. ([$\frac{1}{2}$] for saying that OLTP typically have many concurrent users, or that data warehouses typically have few users.) \end{marking} \end{subquestion} \begin{subquestion}{4} physical design and tuning strategy. \begin{marking} OLTP databases are queried and updated by many users simultaneously, so we need to trade off between optimising for concurrency, updates or query performance [1]. The usual tuning techniques may be applied, but we should be particularly careful with the use of indexes (typically B-tree) because of the relatively high update rates [1]. ([$\frac{1}{2}$] for saying that the database is often relatively small.) In a data warehouse, updates are few but queries are many and complex, so we need to optimise aggessively for query performance [1]. The usual tuning techniques may be applied, but in particular we can heavily index the data (bitmap indexes and index-organised tables really come into their own here) [1].([$\frac{1}{2}$] for saying that the database is typically very large.) \end{marking} \end{subquestion} \end{question} %====================================================================== \begin{question}{4} Describe in detail the steps involved in the process of importing data into a data warehouse from external sources. \begin{marking} \begin{itemize} \item Extraction: obtaining data from various sources (electronically or otherwise) [1]. The data are often loaded into a special \emph{staging database} for further processing [1]. \item Transformation: data are cleaned up and manipulated into standard formats, so that data may be queried consistently regardless of source [1]. \item Loading: the transformed data are loaded into the data warehouse by some form of bulk loading tool [1]. \end{itemize} \end{marking} \end{question} %====================================================================== \begin{question}{1} Explain what is meant by the term ``data mart''. \begin{marking} A data mart is a small \emph{specialised} data warehouse, often a subset of a larger enterprise-wide data warehouse [1]. \end{marking} \end{question} %====================================================================== \begin{question}{6} Describe in detail two ways in which an OLAP tool can be used to change the way in which data are presented. Use diagrams and/or examples to illustrate your answer. \begin{marking} Any two of: \begin{itemize} \item Drill down/roll up allows us to move from more summarised data to more specific data and vice versa. \item ``Slice and dice'' allows us to reduce our view of a multidimensional data set to specific values of one or more dimensions. \item Pivoting allows us to change the layout of the data being output to give a different perspective on the same data set, e.g., viewing sales of products by region vs. sales in a region by product. \end{itemize} [1] each for technique, [1] each for description, [1] each for useful diagrams or examples. Any of the following are worth half marks: \begin{itemize} \item graphs/visualisation tools \item highlighting exceptions \end{itemize} \end{marking} \end{question} \end{examsection} \end{document}

Show notes View

example.tex 100644 → 0

\usepackage{graphicx}
\usepackage{listings}

\examyear{2001}
\department{Information Science}
\papernumber{INFO 321}
\papertitle{Database Systems}
\instructions{Answer \underline{ALL} questions.}
\allowcalculators[approved]

\begin{document}

\maketitlepage

%======================================================================

\begin{question}{10}

\textsf{\textbf{Customer}(\underline{customer\_no}, name, address, city,
	state, country, phone, email, status)}

\begin{itemize}

\item \textsf{customer\_no} is the primary key and because this is an
		Oracle8 database, a unique B-tree index has been automatically created
		on this column.

\item About 10\% of customers are inactive, about 50\% have bronze
		status, about 35\% have silver status and about 5\% have gold status.

\item The average row size is about 200 bytes and there are
		approximately 5,000,000 rows.

\item UPDATE, INSERT and DELETE operations on the table are uncommon.

\item Very few queries are for individual customers; rather, most
		queries are based on various combinations of values from
		\textsf{status}, \textsf{city}, \textsf{state} and \textsf{country}.

\end{itemize}
	\medskip

%----------------------------------------------------------------------

Hashing is definitely \emph{not} an option, because very few
			queries are exact match.
		\end{marking}
	\end{subquestion}
		
%----------------------------------------------------------------------

\end{question}

\newpage

%======================================================================

\begin{question}{10}
	There are two generic parameters that affect the structure of a B-tree
	index.

%----------------------------------------------------------------------

\end{question}
	
%======================================================================

\begin{question}{5}\label{clusterq}

%----------------------------------------------------------------------

create index enrol_cluster_index on cluster enrolment_cluster;

\end{question}

\end{examsection}

\newpage

% CREATE CLUSTER (first part), p 4-207

\begin{center}
	\textbf{Oracle8 syntax diagrams for use in
	question~\ref{clusterq}(\ref{clusterqcode})}
	
	\bigskip

\includegraphics{CreateCluster}

\vfill

% CREATE INDEX (parts 1 and 2), p. 4-238

\includegraphics{CreateIndex}

% CREATE TABLE (parts 1 and 2), p. 4-308

\vfill

\includegraphics{CreateTable1}
	
	\bigskip	
	
	\includegraphics{CreateTable2}
\end{center}

%======================================================================

\begin{question}{10}\label{odlq}
	Examine the following Object Definition Language (ODL) schema, then answer
	the questions below.

Nine is a possible alternative answer [1], because you could count
			\textsf{home\_address} as three attributes, giving four local and
			five inherited attributes [1].

\begin{subquestion}{3}
		Examine the clause
		``\lstinline[language={[Oracle8]SQL}]+(extent Staff key emp_ID)+''
		on line 14 of the ODL listing above.

%----------------------------------------------------------------------

\newpage

\begin{center}
	\textbf{Oracle8 syntax diagram for use in
	question~\ref{odlq}(\ref{oracleodl})}
	
	\bigskip
	\includegraphics{CreateObjectType}
	\bigskip\bigskip
\end{center}

%======================================================================

\begin{question}{15}

%----------------------------------------------------------------------

Useful diagram(s).
		\end{marking}
	\end{subquestion}
	
%----------------------------------------------------------------------

A class is an ODMG object type that can define both state and
			behaviour [1]. You can create instances of classes [1].

\end{question}

\end{examsection}

%======================================================================

\begin{question}{5}
	A company currently has a distributed database spread across two sites. It
	is necessary to move some data from one site to the other.

%----------------------------------------------------------------------

[1] for identifying 2PC, [1] each for both phases.
		\end{marking}
	\end{subquestion}
	
%----------------------------------------------------------------------

\end{question}

%======================================================================

\begin{question}{6}

%----------------------------------------------------------------------

\end{question}
	
%======================================================================

\end{examsection}

%======================================================================

\item \textbf{Decision type:} operational data support short term
			day-to-day decision making [1] whereas decision support data
			support tactical and strategic decision making (longer term) [1].

\item \textbf{Granularity:} operational data are generally very
			fine-grained (individual transactions) [1], whereas DS data are
			aggregated at varying levels of summarisation [1].

\end{itemize}
	\end{marking}
\end{question}

%======================================================================

\begin{question}{6}
	Discuss in detail the key differences between databases designed for online
	transaction processing and databases designed for data warehousing, in
	terms of both:

%======================================================================

\begin{question}{4}
	Describe in detail the steps involved in the process of importing data into
	a data warehouse from external sources.
	\begin{marking}
		\begin{itemize}

\item Extraction: obtaining data from various sources
			(electronically or otherwise) [1]. The data are often loaded into a
			special \emph{staging database} for further processing [1].

\item Transformation: data are cleaned up and manipulated into
			standard formats, so that data may be queried consistently
			regardless of source [1].

\item Loading: the transformed data are loaded into the data
			warehouse by some form of bulk loading tool [1].

\end{itemize}
	\end{marking}
\end{question}

%======================================================================

\item Drill down/roll up allows us to move from more summarised
			data to more specific data and vice versa.

\item ``Slice and dice'' allows us to reduce our view of a
			multidimensional data set to specific values of one or more
			dimensions.

\end{examsection}

\end{document}

Ignore Space Show notes View example1.tex
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Generates a plain examination paper. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \documentclass{ouexam} \input{content} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Generates a plain examination paper. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \documentclass{ouexam} \input{example}

Ignore Space Show notes View

example1.tex

Ignore Space Show notes View example2.tex
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Generates an examination paper with marking schedule information. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \documentclass[markingschedule]{ouexam} \input{content} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Generates an examination paper with marking schedule information. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \documentclass[markingschedule]{ouexam} \input{example}

Ignore Space Show notes View

example2.tex

Show line notes below