Newer
Older
Publications / Koli_2017 / Koli_2017_Stanger.tex
\documentclass[sigconf, authordraft]{acmart}

\usepackage{tcolorbox}
\usepackage{listings}

\lstloadlanguages{PHP}
\lstset{language=PHP,basicstyle=\footnotesize\ttfamily}


% \title{(Mis)using unit testing to semi-automatically grade SQL schemas}
\title{Semi-automated grading of SQL schemas \\ by (mis)use of database unit testing}
\author{Nigel Stanger}
\orcid{orcid.org/0000-0003-3450-7443}
\affiliation{
    \institution{University of Otago}
    \department{Department of Information Science}
    \city{Dunedin}
    \country{New Zealand}
}
\email{nigel.stanger@otago.ac.nz}

\begin{document}

\begin{abstract}
    abstract
\end{abstract}

\maketitle

\cite{Bhangdiya.A-2015a-XDa-TA,Chandra.B-2015a-Data,Chandra.B-2016a-Partial,Dekeyser.S-2007a-Computer,Kearns.R-1997a-A-teaching,Prior.J-2004a-Backwash,Russell.G-2005a-Online,Gong.A-2015a-CS-121-Automation,Farre.C-2008a-SVTe,Dietrich.S-1997a-WinRDBI,Binnig.C-2008a-Multi-RQP,Chays.D-2008a-Query-based,Marcozzi.M-2012a-Test,Haller.K-2010a-Test,Vatanawood.W-2004a-Formal,Lukovic.I-2003a-Proceedings,Bench-Capon.T-1998a-Report,Spivey.J-1989a-An-introduction,Choppella.V-2006a-Constructing,Ambler.S-2006a-Database}

\section{Introduction}

Any introductory database course needs to cover several core concepts, including what is a database, what is a logical data model, and how to create and interact with a database. Typically such courses will focus on the Relational Model and its embodiment in SQL database management systems (DBMSs). This is partly because the Relational Model provides a sound theoretical framework for discussing key database concepts [cite], and partly because SQL DBMSs are still widely used. The shadow of SQL is so strong that even non-relational systems have adopted some form of SQL-like language in order to leverage existing knowledge (e.g., OQL \cite{Cattell.R-2000a-ODMG3}, HiveQL \cite{Apache-2017a-Hive}, and CQL \cite{Apache-2017a-CQL}).

Courses that teach SQL usually include one or more assessments that test students' SQL skills. These test students' ability to 
create a database using SQL data definition (DDL) statements, and to interact with the database using SQL data manipulation (DML) statements. Manually grading such code can be a slow, tedious, and potentially error-prone process. Automating the grading process enables faster turnaround times and greater consistency [cite]. If the grading can be done in real time, the grading tool could become part of a larger, interactive SQL learning environment \cite{Kenny.C-2005a-Automated,Kleiner.C-2013a-Automated,Mitrovic.A-1998a-Learning,Russell.G-2004a-Improving,Sadiq.S-2004a-SQLator}.

There have been many prior efforts to automatically grade SQL DML (see Section~\ref{sec-literature}), we have been unable to find any similar systems for automatically grading SQL DDL.

In our department, we offered typical introductory papers on database systems. INFO 212 was offered from 1997(?) to 2011, and was a dedicated semester-long course (13 weeks). It was replaced by INFO 214 in 2012, which included 6\(\frac{1}{2}\) weeks of core database material (the remainder of the paper covered data communications and networking). It was discontinued at the end of 2016.

Over the years that these two papers were offered, we tried several different approaches to formulating and grading SQL DDL assessments. The three most significant were:
\begin{enumerate}
    \item Allow students to choose and code their own scenario. It could be argued that this could boost student interest in the assessment, as they can work on a problem domain that interests them. It does however mean that every student's submission is different, and makes the grading process harder.
    
    \item Assign a standard scenario, but leave some elements under-specified. This improves the grading experience, but there is still the possibility of variation among student submissions, as they may interpret the under-specified elements in different ways. This is particularly problematic to automate if they choose different names for tables and columns, or implement a different structure.
    
    \item Provide a detailed specification of a standard scenario. This can be presented as the detailed output from the requirements analysis phase. Students are told that they need to adhere closely to the specification, as other developers will be independently using the same specification to implement end-user applications. Students still have some room to alter things, but such changes cannot affect the view of the database seen by clients. This approach tests both the studnets' ability to write SQL DDL, and to interpret and correctly convert a written database specification into a corresponding SQL schema.
\end{enumerate}

The third approach was used from 2009 until 2016 (?dates), and was what inspired the work discussed in this paper. The third approach is also the most amenable to autmoation, as much of the assessment specification is fixed in advance, with less room for deviation.

Prior approaches to grading SQL DDL have focused on the \texttt{CREATE TABLE} syntax, but we have taken a different approach, where we verify that the implemented schema conforms to the behaviour expected from the original specification. If the student achieves this, then by definition the DDL syntax must be correct (weakness: we do not consider coding style). This enables us to focus less on the specifics of the syntax and more on whether students have implemnted the requirements correctly.

The requirements specification for the assessment is tightly defined, which means it can be readily codified in machine-readable form. Rather than attempt to parse and check the \texttt{CREATE TABLE} statements directly, we instead issue queries against the schema's metadata (catalog), and compare the results of these queries against the machine-readable version of the specification. The process then effectively becomes one of unit testing the schema against the original requirements. In our implementation, we used the PHPunit database unit testing framework to carry out this process, albeit in a somewhat unorthodox way (see Section~\ref{sec-architecture}).

% original schema is codified in machine-readable form
% rather than attempt to parse CREATE TABLE statements, simply execute the DDL code to generate the database schema in the target DBMS, then run queries against the schema's metadata
% use a database unit testing framework (PHPUnit) to automate

\section{Prior work}
\label{sec-literature}

\section{Architecture}
\label{sec-architecture}

% System was implmented in PHP in order to speed development of web interface. Also because of ready availability of database unit testing framework PHPunit (a PHP implementation of the DBunit testing framework for Java).
% Main program can be launched from either a console program or a web application. Console application uses a single database user: student's schema loaded into DBMS (assuming error-free), then console app is run. Web application: students supply their DBMS credentials and the system connect directly to their schema, with output appearing in the web browser.
% Project specification is encoded as a collection of PHP classes, one per table. These classes encode the expected name, a range of possible data types, minimum and maximum lengths, nullability, etc., of the table and its columns. It also includes specification of simple constraints such as minimum and maximum values. Valid and invalid values can also be supplied.

% ANSI terminal colours for Terminal.app; see https://en.wikipedia.org/wiki/ANSI_escape_code#Colors
% grey 203, 204, 205
% green 37 188 36
% red 194, 54, 33
\definecolor{test grey}{rgb}{0.796,0.800,0.804}
\definecolor{test green}{rgb}{0.145,0.737,0.141}
\definecolor{test red}{rgb}{0.761,0.212,0.129}

\tcbset{boxsep=0pt, boxrule=0pt, arc=0pt, left=0pt, right=0pt, top=0.5pt, bottom=0.5pt}

\newlength{\dothskip}
\setlength{\dothskip}{0.72cm}
\newlength{\dotvskip}
\setlength{\dotvskip}{-1.25ex}
\newlength{\codeskip}
\setlength{\codeskip}{-0.5ex}

\begin{table}
    \ttfamily\scriptsize
    \hrule
    \begin{tabbing}
        0123\=\kill
        \tcbox[colback=test grey]{NOTE: Checking structure of table Product.} \\[\codeskip]
        TEST: [[ Product ]]  \\
        \>  \textcolor{test green}{+ OK} \\
        \tcbox[colback=test green]{+++ PASSED: Table Product exists.} \\[\codeskip]
        TEST: [[ Product.Product\_code ]]  \\
        \>  \textcolor{test green}{+ OK} \\[\dotvskip]
        \hspace*{\dothskip}\vdots \\
        \tcbox[colback=test green]{+++ PASSED: Table Product contains all the expected columns.} \\[\codeskip]
        TEST: [[ Product.Product\_code: data type is NUMBER | INTEGER ]]  \\
        \>  \textcolor{test green}{+ OK} \\[\dotvskip]
        \hspace*{\dothskip}\vdots \\
        \tcbox[colback=test green]{+++ PASSED: All columns of table Product have data types compatible with the}\\[\codeskip]
        \tcbox[colback=test green]{specification.} \\[\codeskip]
        TEST: [[ Product.Product\_code precision and scale = 8 (with scale 0) ]]  \\
        \>  \textcolor{test green}{+ OK} \\[\dotvskip]
        \hspace*{\dothskip}\vdots \\
        \tcbox[colback=test green]{+++ PASSED: All columns of table Product have lengths compatible with the} \\[\codeskip]
        \tcbox[colback=test green]{specification.} \\[\codeskip]
        TEST: [[ Product PK ]]  \\
        \>  \textcolor{test green}{+ OK} \\
        \tcbox[colback=test green]{+++ PASSED: Primary key of table Product exists.} \\[\codeskip]
        TEST: [[ Product PK: Product\_code ]]  \\
        \>  \textcolor{test green}{+ OK} \\
        \tcbox[colback=test green]{+++ PASSED: Primary key of table Product includes (only) the expected} \\[\codeskip]
        \tcbox[colback=test green]{columns.} \\[\dotvskip]
        \hspace*{\dothskip}\vdots \\
        \tcbox[colback=test grey]{NOTE: Testing constraints of table Product.} \\[\codeskip]
        TEST: [[ Product.Stock\_count accepts ``0'' ]]  \\
        \>  \textcolor{test green}{+ OK} \\
        TEST: [[ Product.Stock\_count accepts ``99999'' ]]  \\
        \>  \textcolor{test green}{+ OK} \\
        TEST: [[ Product.Restock\_level accepts ``0'' ]]  \\
        \>  \textcolor{test red}{- FAILED! Column Product.Restock\_level won't accept legal value 0.} \\
        \textcolor{test red}{Failed asserting that false is true.} \\
        TEST: [[ Product.Restock\_level accepts ``99999'' ]]  \\
        \>  \textcolor{test green}{+ OK} \\
        TEST: [[ Product.Minimum\_level accepts ``0'' ]]  \\
        \>  \textcolor{test red}{- FAILED! Column Product.Minimum\_level won't accept legal value 0.} \\
        \textcolor{test red}{Failed asserting that false is true.} \\
        TEST: [[ Product.Minimum\_level accepts ``653'' ]]  \\
        \>  \textcolor{test green}{+ OK} \\[\dotvskip]
        \hspace*{\dothskip}\vdots \\
        \tcbox[colback=test red, coltext=test grey]{--- FAILED: 2 of 8 legal values tested were rejected by a CHECK constraint.}
    \end{tabbing}
    \hrule
    \caption{Example of output}
\end{table}

\begin{table}
    \footnotesize
    \hrule
    \begin{verbatim}
public function getTableName()
{
    return 'PRODUCT';
}

public function getColumnList()
{
    return array(
        'PRODUCT_CODE' => array(
            'generic_type' => 'NUMBER',
            'sql_type' => array('NUMBER', 'INTEGER'),
            'min_length' => 8, 'max_length' => 8, 'decimals' => 0,
            'test_value' => 87654321, 'nullable' => false),
        'DESCRIPTION' => array( ... ),
        'STOCK_COUNT' => array(
            'generic_type' => 'NUMBER',
            'sql_type' => array('NUMBER', 'INTEGER'),
            'min_length' => 5, 'max_length' => 6, 'decimals' => 0,
            'underflow' => -1, 'overflow' => 100000,
            'legal_values' => array(0, 99999), 'test_value' => 456,
            'nullable' => false),
        'RESTOCK_LEVEL' => array( ... ),
        'MINIMUM_LEVEL' => array( ... ),
        'LIST_PRICE' => array(
            'generic_type' => 'NUMBER',
            'sql_type' => array('NUMBER', 'INTEGER'),
            'min_length' => 7, 'max_length' => 8, 'decimals' => 2,
            'underflow' => -0.01, 'overflow' => 100000.00,
            'legal_values' => array(0, 99999.99), 'test_value' => 123.99,
            'nullable' =>  false),
        'ASSEMBLY_MANUAL' => array(
            'generic_type' => 'BINARY',
            'sql_type' => array('BLOB'),
            'test_value'    =>  "NULL",
            'nullable' => true),
        'ASSEMBLY_PROGRAM' => array( ... )
    );

}

public function getPKColumnList()
{
    return array( 'PRODUCT_CODE' );
}

public function getFKColumnList()
{
    return array();
}
    \end{verbatim}
    \hrule
    \caption{Example of table specification}
\end{table}

\section{Evaluation}
\label{sec-evaluation}

\section{Conclusions \& future work}
\label{sec-conclusion}

\newpage
\bibliographystyle{ACM-Reference-Format}
\bibliography{Koli_2017_Stanger}

\end{document}