diff --git a/APCCM2017_Stanger.tex b/APCCM2017_Stanger.tex index 5ac3e85..c5cc166 100644 --- a/APCCM2017_Stanger.tex +++ b/APCCM2017_Stanger.tex @@ -186,7 +186,7 @@ \begin{document} \setcopyright{none} -\conferenceinfo{APCCM 2017}{January 31--February 3, 2017, Geelong, VIC, Australia} +% \conferenceinfo{APCCM 2017}{January 31--February 3, 2017, Geelong, VIC, Australia} \title{Characterising relational view updates using relative information capacity} @@ -206,7 +206,7 @@ \begin{abstract} -In 2012, Chris Date published a book about the view update problem in the context of the Relational Model. He presented several detailed examples of different varieties of view updates and characterised their behaviour, in the process deriving a set of principles for updating views based around the notion of \emph{information equivalence} of view schemas. In this paper we examine how this notion can be more formally characterised using Hull's concept of \emph{relative information capacity}. Specifically, we use Miller's \emph{schema intension graph} formalism to characterise the information equivalence of Date's restriction view example, and show that this supports his operational definition of information equivalence. +In 2012, Chris Date published a book about the view update problem in the context of the Relational Model. He presented several detailed examples of different varieties of view updates and characterised their behaviour, in the process deriving a set of principles for updating views based on the notion of \emph{information equivalence} of view schemas. In this paper we discuss work in progress that examines how Date's operational definition of information equivalence can be formally characterised using Hull's concept of \emph{relative information capacity}. As a proof of concept, we use an extension of Miller's \emph{schema intension graph} formalism to characterise the information equivalence of Date's restriction view example. \end{abstract} @@ -215,13 +215,13 @@ \section{Introduction} -\noindent The view update problem is a well known and long standing problem of great practical significance in the database field. Indeed, Date \cite{Date.C-2013a-View} likens it to significant unsolved problems in other fields such as the Riemann Hypothesis in mathematics or ``\(P = \mathit{NP}\)?'' in computational complexity theory. In his 2012 book \cite{Date.C-2013a-View} Date developed a comprehensive set of principles for updating relational views, based on the notion of \emph{information equivalence}. However, that notion was defined more operationally than by a formal theoretical definition. +\noindent The view update problem is a well known and long standing problem of great practical significance in the database field. Indeed, Date \cite{Date.C-2013a-View} likens it to significant unsolved problems in other fields such as the Riemann Hypothesis in mathematics or ``\(P = \mathit{NP}\)?'' in computational complexity theory. In his 2012 book \cite{Date.C-2013a-View} Date developed a comprehensive set of principles for updating relational views, based on the notion of \emph{information equivalence}. However, that notion was defined more operationally than formally. -View updating can be considered a special case of schema translation: we effectively have one schema comprising a set of views and another schema comprising a set of base relations, and need to translate updates from one schema to the other.\footnote{These schemas may be actually separate, or more likely may be sub-schemas of a larger database schema, with effective separation achieved via the access privilege mechanism.} It therefore makes sense that formalisms developed to assist with characterising sche\-ma translations would also be applicable to view updates. To that end, we propose to use Hull's notion of \emph{relative information capacity} \cite{Hull.R-1986a-Relative} as a means to characterise the equivalence of the schemas involved in a given view configuration. +View updating can be considered a special case of schema translation: we effectively have one schema comprising a set of views and another schema comprising a set of base relations, and need to translate updates from one schema to the other.\footnote{These schemas may be actually separate, or more likely may be sub-schemas of a larger database schema, with effective separation achieved via the access privilege mechanism.} It therefore makes sense that formalisms developed to characterise sche\-ma translations would also be applicable to view updates. We therefore propose to use Hull's notion of \emph{relative information capacity} \cite{Hull.R-1986a-Relative} as a means to characterise the information equivalence of the schemas that make up a given view configuration. -In particular, we will use Miller's \emph{schema intension graph} (SIG) formalism, which facilitates the comparison of relative information capacity across schemas. In this paper, we show how SIGs can be used to characterise the equivalence of schemas within a view configuration. To facilitate this, we extend the SIG formalism with additional concepts, including relation and tuple types, key constraints and functional dependencies. We also show how the formalism can be practically used to characterise the information equivalence of a view configuration, by applying it to Date's example of disjoint restriction views \cite{Date.C-2013a-View}. The results our formal analysis are shown to be consistent with Date's operational analysis of information equivalence in this example. +In particular, we will use an extension of Miller's \emph{schema intension graph} (SIG) formalism, which facilitates the comparison of relative information capacity across schemas. We extend the SIG formalism with concepts such as relation and tuple types, key constraints, and functional dependencies, then show how these extended SIGs can be used to characterise the equivalence of schemas within a view configuration. We also show a proof of concept of how the extended formalism can be used to characterise the information equivalence of a view configuration, by applying it to Date's example of disjoint restriction views \cite{Date.C-2013a-View}. The results our analysis are shown to be consistent with Date's operational analysis of information equivalence in this example. -In Section~\ref{sec-view-update-problem}, we briefly review the view update problem and relevant research, and in Section~\ref{sec-relations-types} we introduce some core concepts that are used throughout the remainder of the paper. In Section~\ref{sec-info-capacity} we provide a detailed explanation of relative information capacity and the SIG formalism, then explain in Section~\ref{sec-relational-sigs} how to construct a SIG that represents a relational schema. In Section~\ref{sec-characterising} we analyse Date's restriction view example \cite{Date.C-2013a-View} in detail, and characterise the information capacity of the various sub-schemas involved. We discuss conclusions and future work in Section~\ref{sec-conclusion}. +In Section~\ref{sec-view-update-problem}, we briefly review the view update problem and relevant research, and in Section~\ref{sec-relations-types} we introduce some core concepts that are used throughout the remainder of the paper. In Section~\ref{sec-info-capacity} we provide a detailed explanation of relative information capacity and the SIG formalism, then explain in Section~\ref{sec-relational-sigs} how to extend the SIG formalism to more precisely model a relational schema. In Section~\ref{sec-characterising} we analyse Date's restriction view example \cite{Date.C-2013a-View} in detail, and characterise the information capacity of the various sub-schemas involved. We discuss conclusions and future work in Section~\ref{sec-conclusion}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -276,15 +276,15 @@ %%%%%%%%%%%%%%%%%%%% -Keller noted that semantic information is essential in order to choose an appropriate translation \cite{Keller.A-1986a-Role}, and suggested this could be collected at view definition time from the database administrator (DBA) \cite{Keller.A-1985a-Algorithms}. Masunaga argued, however, that this would be insufficient to completely resolve ambiguity, and suggested collecting further semantic information from end users at update time \cite{Masunaga.Y-1984a-Relational}. Shu \cite{Shu.H-2000a-Using} discussed an interesting approach to this kind of information gathering, by treating the problem of generating suitable view translations as a constraint satisfaction problem. This enables constraints to be added incrementally at view update time. However, while collecting such information from the DBA seems reasonable, collecting it from the end user at update time could be seen as too burdensome, and could impact the usability of query interfaces. It is also unclear which information needs to be captured, and how best to capture it. +Keller noted that semantic information is essential in order to choose an appropriate translation \cite{Keller.A-1986a-Role}, and suggested that the da\-ta\-base administrator (DBA) could provide this at view definition time \cite{Keller.A-1985a-Algorithms}. Masunaga argued, however, that this would be insufficient to completely resolve ambiguity, and suggested collecting further semantic information from end users at update time \cite{Masunaga.Y-1984a-Relational}. Shu \cite{Shu.H-2000a-Using} discussed an interesting approach to this kind of information gathering, by treating the problem of generating suitable view translations as a constraint satisfaction problem. This enables constraints to be added incrementally at view update time. However, while collecting such information from the DBA seems reasonable, collecting it from the end user at update time could be seen as too burdensome, and could impact the usability of query interfaces. It is also unclear which information needs to be captured, and how best to capture it. -Kotidis et al.\ \cite{Kotidis.Y-2006a-Updates} discussed a different approach to avoiding side effects in view translations. They proposed creating a materialised copy of the view that could be updated directly, rather than mapping updates from the (virtual) view to the base tables. Clearly this approach enables the view to updated without translation side effects, but does imply that the physical manifestation of the view may no longer match the original view query (i.e., its extension may not match its intension). +Kotidis et al.\ \cite{Kotidis.Y-2006a-Updates} discussed a different approach to avoiding side effects in view translations. They proposed creating a materialised copy of the view that could be updated directly, rather than mapping updates from the (virtual) view to the base tables. Clearly this enables the view to be updated without translation side effects, but does imply that the physical manifestation (extension) of the view may no longer match the original view query (intension). Finally, Date has published several books on the theoretical underpinnings of the Relational Model over the last decade, including one specifically targeting the view update problem \cite{Date.C-2013a-View} (indeed, the book is subtitled ``Solving the View Update Problem''). He developed a comprehensive set of rules for updating views in a relational context, supported by a wide array of worked examples. Date advocated explicitly stating all relevant constraints at view creation time, which is similar to the information gathering approaches discussed above \cite{Keller.A-1985a-Algorithms,Masunaga.Y-1984a-Relational,Shu.H-2000a-Using}. He argued that if a complete set of constraints was available, then ambiguity would be diminished or eliminated, and it would be possible to automatically generate a set of appropriate \emph{compensatory rules} to automatically manage the view update process. Compensatory rules are similar in principle to database triggers, except that they are intended to be declarative so that their definitions can made explicitly visible to end users. -Date's work is interesting and compelling, and raises several questions worthy of further investigation. However, there are still some situations where the correct view translation may be ambiguous, e.g., views based on certain types of join, as previously noted by Keller \cite{Keller.A-1985a-Algorithms}. Somewhat disappointingly, Date took a pragmatic approach in these cases, effectively saying ``this feels like the correct solution''. His justification was well argued and logical, but it still falls somewhat short of the ideal of a fully automated solution. +Date's work is interesting and compelling, and raises several questions worthy of further investigation. However, there are still some situations where the correct view translation may be ambiguous, e.g., views based on certain types of join, as previously noted by Keller \cite{Keller.A-1985a-Algorithms}. Somewhat disappointingly, Date took a pragmatic approach in these cases, effectively saying ``this feels like the correct solution''. His justification was well argued and logical, but it still falls short of the ideal of a fully automated solution. -As noted previously, Date characterises the ability to define an appropriate and effective view translation using the notion of \emph{information equivalence}. Two schemas \(\SC{1}\) and \(\SC{2}\) are said to be information equivalent ``if and only if the constraints that apply to [\(\SC{1}\)] and [\(\SC{2}\)] are such that every proposition that can be represented by [\(\SC{1}\)] can be represented by [\(\SC{2}\)] and vice versa'' \cite{Date.C-2013a-View}. While information equivalence is clearly a formal concept, the definition and characterisation in \cite{Date.C-2013a-View} are more operational and pragmatic in nature. This is what we explore further in this paper. +Date characterises the ability to define an appropriate and effective view translation using the notion of \emph{information equivalence}. Two schemas \(\SC{1}\) and \(\SC{2}\) are said to be information equivalent ``if and only if the constraints that apply to [\(\SC{1}\)] and [\(\SC{2}\)] are such that every proposition that can be represented by [\(\SC{1}\)] can be represented by [\(\SC{2}\)] and vice versa'' \cite{Date.C-2013a-View}. While information equivalence is a formal concept, its definition and characterisation in \cite{Date.C-2013a-View} are more operational and pragmatic in nature. This is what we explore further in this paper. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -293,7 +293,7 @@ \section{Relations and types} \label{sec-relations-types} -\noindent We generally follow Date's \cite{Date.C-2012a-SQL-and-Relational} notation and terminology regarding the Relational Model. The main exception is that we do not use Tutorial D notation Date uses in his examples, preferring instead a more classical algebraic style notation. +\noindent We generally follow Date's \cite{Date.C-2012a-SQL-and-Relational} notation and terminology regarding the Relational Model. The main exception is that we do not use the Tutorial D notation Date uses in his examples, preferring instead a more classical algebraic style notation. In this paper, \(A\) represents an attribute (comprising a name:type pair), \(R\) represents a relation variable (\emph{relvar}), and \(v\) represents a value (of an attribute, tuple, relation, etc.). In the following discussion it will generally be clear which attribute a value is associated with, but where necessary, the value will be prefixed with the attribute name, i.e., \(\mathit{name}\colon\mathit{value}\). @@ -307,9 +307,9 @@ \begin{equation}\label{eqn-tuple-type} \TT{R} = \T{A_{1}} \times \T{A_{2}} \times\dotsb\times \T{A_{n}}\text{.} \end{equation} -By definition any given value of \(R\) must comprise a subset (possibly empty) of tuples \(\{t_{1}, t_{2}, \dotsc, t_{m}\} \subseteq \TT{R}\). \(\T{R}\) therefore represents every possible \(k\)-combination of the elements of \TT{R}, for \(0 \le k \le \left|\TT{R}\right|\). +By definition any given value of \(R\) must comprise a set (possibly empty) of tuples \(\{t_{1}, t_{2}, \dotsc, t_{m}\} \subseteq \TT{R}\). \(\T{R}\) therefore represents every possible \(k\)-combination of the elements of \TT{R}, for \(0 \le k \le \left|\TT{R}\right|\). -For example, Date's usual suppliers-and-parts schema \cite{Date.C-2013a-View} includes a relvar \(S\!\) with heading \(\{\Sno, \Sname, \Status, \City\}\), which contains details of suppliers. As per equation~\ref{eqn-tuple-type}, the tuples of \(S\!\) have type \(\TT{S} = \T{\Sno} \times \TSSC\). +For example, Date's usual suppliers-and-parts schema \cite{Date.C-2013a-View} includes a relvar \(S\!\) with heading \(\{\Sno, \Sname, \Status, \City\}\), which contains details of suppliers. From equation~\ref{eqn-tuple-type}, the tuples of \(S\!\) have type \(\TT{S} = \T{\Sno} \times \TSSC\). This notion of type as a set of possible values is synonymous with Codd's original definition of domains in the Relational Model \cite{Codd.E-1970a-Relational,Date.C-2012a-SQL-and-Relational}, and in particular with the notion of domains as ``sets of values'' in schema intension graphs \cite{Miller.R-1994a-Schema,Miller.R-1994a-SIG}. The implications of the latter point will be explored further in Section~\ref{sec-relational-sigs}, after we have discussed the concepts of relative information capacity and schema intension graphs in Section~\ref{sec-info-capacity}. @@ -332,13 +332,13 @@ \noindent A schema conveys information about the phenomenon that it models. Hull \cite{Hull.R-1986a-Relative} defined the concept of \emph{relative information capacity} to describe the information content of different schemas. The relative information capacity of a schema determines the set of all valid instances of that schema \cite{Hull.R-1986a-Relative,Miller.R-1994a-PhD,Miller.R-1993a-Information}. Relative information capacity, as its name implies, is not a quantitative measure; rather it is an indicator of the relative information content of different schemas. -Relative information capacity can be used as a basis for identifying the differences between schema instances, and can therefore be used as an aid to schema integration and translation \cite{Miller.R-1994a-PhD}. As noted earlier, view updating is a special case of this, so it follows that information capacity can also be used to characterise view updates, or more precisely, to characterise the information content of views compared to the base relvar(s) from which they are derived. +Relative information capacity can be used as a basis for identifying the differences between schema instances, and can therefore be used as an aid to schema integration and translation \cite{Miller.R-1994a-PhD}. As noted earlier, view updating is a special case of this, so it follows that information capacity can also be used to characterise the information content of views compared to the base relvar(s) from which they are derived. -Miller et al.\ \cite{Miller.R-1993a-Information} defined relative information capacity in terms of two types of mapping. An \emph{information (capacity) preserving mapping} from schema \(\SC{1}\) to schema \(\SC{2}\) maps every element of \(\SC{1}\) to an element of \(\SC{2}\), but only some elements of \(\SC{2}\) to elements of \(\SC{1}\). \(\SC{2}\) has an information capacity greater than \(\SC{1}\), and is said to \emph{dominate} \(\SC{1}\), denoted \Dominates{\SC{2}}{\SC{1}}. Compare this to an \emph{equivalence preserving mapping} from \(\SC{1}\) to \(\SC{3}\), which maps every element of \(\SC{1}\) to an element \(\SC{3}\) and vice versa (i.e., both the mapping and its inverse are information preserving). The information capacities of \(\SC{1}\) and \(\SC{3}\) are identical, and the schemas are said to be \emph{equivalent}, denoted \Equivalent{\SC{1}}{\SC{3}}. +Miller et al.\ \cite{Miller.R-1993a-Information} defined relative information capacity in terms of two types of mapping. An \emph{information (capacity) preserving mapping} from schema \(\SC{1}\) to schema \(\SC{2}\) maps every element of \(\SC{1}\) to an element of \(\SC{2}\), but only some elements of \(\SC{2}\) to elements of \(\SC{1}\). \(\SC{2}\) has an information capacity greater than \(\SC{1}\), and is said to \emph{dominate} \(\SC{1}\), denoted \Dominates{\SC{2}}{\SC{1}}. An \emph{equivalence preserving mapping} from \(\SC{1}\) to \(\SC{3}\) maps every element of \(\SC{1}\) to an element \(\SC{3}\) and vice versa (i.e., both the mapping and its inverse are information preserving). The information capacities of \(\SC{1}\) and \(\SC{3}\) are identical, and the schemas are said to be \emph{equivalent}, denoted \Equivalent{\SC{1}}{\SC{3}}. (Note that Miller et al.\ \cite{Miller.R-1994a-Schema} use stricter definitions of equivalence and dominance than those used here; strictly speaking, the discussion of equivalence and dominance in this section should properly refer to \emph{absolute} dominance and equivalence, whereas the following discussion on SIG transformations should properly refer to \emph{internal} dominance and equivalence \cite{Miller.R-1994a-Schema}. The distinction, however, is not crucial to the discussion in this paper.) -Since relative information capacity is a relative measure, it can only be effectively used in a comparative manner. To this end, Miller et al.\ \cite{Miller.R-1994a-SIG} defined the \emph{schema intension graph} formalism to provide a tool for comparing the information capacity of schemas. A schema intension graph (SIG) describes the associations between domains in a schema, and provides a way to ``reason about constraints on \emph{collections of entities} in an instance of the entity set, rather than about the internal structure of a single entity'' \cite{Miller.R-1994a-SIG}. In particular, SIGs provide a useful abstract formalism within which to perform schema transformations, as transforming the SIG for a schema is effectively the same as transforming the schema itself. +Relative information capacity can only be effectively used in a comparative manner. To this end, Miller et al.\ \cite{Miller.R-1994a-SIG} defined the \emph{schema intension graph} formalism to provide a tool for comparing the information capacity of schemas. A schema intension graph (SIG) describes the associations between domains in a schema, and provides a way to ``reason about constraints on \emph{collections of entities} in an instance of the entity set, rather than about the internal structure of a single entity'' \cite{Miller.R-1994a-SIG}. In particular, SIGs provide a useful abstract formalism within which to perform schema transformations, as transforming the SIG for a schema is effectively the same as transforming the schema itself. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -349,21 +349,21 @@ \noindent A SIG is a graph \(G = (N, E)\) \cite{Miller.R-1994a-Schema}. The nodes \(n_{i} \in N\) represent typed domains (i.e., sets of values). Nodes may be \emph{simple} (atomic) or \emph{constructed} (composite). Simple nodes are denoted by the name of the domain that they represent. Constructed nodes represent a collection of domains and are built using the \(\times\) (cartesian product) and \(+\) (union) operators. -The edges \(e_{i} \in E\) represent binary relations (in the mathematical sense) between domains. Edges are denoted by plain lines with an optional label, and may be composed to form a \emph{path} between two nodes (the notation \(e_{2} \circ e_{1}\) means ``edge \(e_{1}\) followed by edge \(e_{2}\)''). Some edges may be designated as \emph{selection} edges, labelled \(\RelRestrict\), or \emph{projection} edges, labelled \(\RelProject\). These correspond to the relational algebra operators of the same name \cite{Codd.E-1972a-Completeness}. +The edges \(e_{i} \in E\) represent binary relations (in the mathematical sense) between domains. Edges are denoted by lines with an optional label, and may be composed to form a \emph{path} between two nodes (the notation \(e_{2} \circ e_{1}\) means ``edge \(e_{1}\) followed by edge \(e_{2}\)''). Some edges may be designated as \emph{selection} edges, labelled \(\RelRestrict\), or \emph{projection} edges, labelled \(\RelProject\). These correspond to the relational algebra operators of the same name \cite{Codd.E-1972a-Completeness}. Edges may also be annotated with up to four properties denoting simple constraints on the binary relation represented by the edge: totality, surjectivity, functionality and injectivity. These properties are defined as follows \cite{Miller.R-1994a-Schema}: \begin{quotation} - \noindent Let \(A\) and \(B\) be two sets. A binary relation \(r : A - B\) is \emph{total} (denoted \(e : A \Total B\)) if it is defined for all elements of \(A\); \emph{surjective} (\(e : A \Surjective B\)) if it is defined for all elements of \(B\); \emph{functional} (\(e : A \Functional B\)) if an element of \(A\) determines at most one element of \(B\); and \emph{injective} (\(e : A \Injective B\)) if an element of \(B\) determines at most one element of \(A\). + \noindent Let \(A\) and \(B\) be two sets. A binary relation \(r : A \sigedge{00} B\) is \emph{total} (denoted \(e : A \Total B\)) if it is defined for all elements of \(A\); \emph{surjective} (\(e : A \Surjective B\)) if it is defined for all elements of \(B\); \emph{functional} (\(e : A \Functional B\)) if an element of \(A\) determines at most one element of \(B\); and \emph{injective} (\(e : A \Injective B\)) if an element of \(B\) determines at most one element of \(A\). \end{quotation} -Note that the correct usage of these properties is determined by the direction in which the binary relation is read. That is, the binary relation \(A \Functional B\) is functional if read from left to right, but injective if read from right to left. A \emph{bijection} is a binary relation that has all four properties \cite{Miller.R-1994a-SIG}. +Note that the correct usage of these properties is sensitive to the direction in which the binary relation is read. That is, \(A \Functional B\) is functional, whereas \(B \Injective A\) is injective. A \emph{bijection} is a binary relation that has all four properties \cite{Miller.R-1994a-SIG}. -Essentially, \(A \Total B\)) means that every element of \(A\) (conversely, \(B\) for \(A \Surjective B\)) must participate in the binary relation, while \(A \Functional B\) means that a given element of \(A\) (conversely, \(B\) for \(A \Injective B\)) may appear in at most one instance of the binary relation. +In essence, \(A \Total B\) means that every element of \(A\) (conversely, \(B\) for \(A \Surjective B\)) must participate in the binary relation. \(A \Functional B\) means that a given element of \(A\) (conversely, \(B\) for \(A \Injective B\)) may appear in at most one instance of the binary relation. -Note that use of the term ``relation'' here, while mathematically correct, clearly has potential to cause confusion in any discussion that also involves the Relational Model. From now on, we shall therefore refer to the binary relations represented by SIG edges as ``associations''. +Note that the term ``relation'' here, while mathematically correct, is potentially confusing in any discussion that also involves the Relational Model. From now on, we shall therefore refer to the binary relations represented by SIG edges as ``associations''. -A selection edge \(e : A \LabelledEdge{\sigedge{00}}{\RelRestrict} B\) indicates that the elements of \(B\) comprise a subset (or specialisation) of the elements of \(A\). If \(B = A\), which we term a \emph{trivial} selection, every element of B must appear in A and vice versa. Similarly, every element of \(A\) can be associated with at most one element of \(B\), and vice versa. Trivial selection edges are therefore bijective, i.e., \(A \LabelledEdge{\Bijective}{\RelRestrict} B\). If \(B \subset A\), which we term a \emph{non-trivial} selection, every element of \(B\) must appear in \(A\), but the converse no longer holds. The mapping remains one-to-one, as before. Non-trivial selection edges are therefore surjective (from the perspective of the supertype), functional, and injective, i.e., \(A \SelectionEdge B\). +A selection edge \(e : A \LabelledEdge{\sigedge{00}}{\RelRestrict} B\) indicates that the elements of \(B\) comprise a subset (or specialisation) of the elements of \(A\). If \(B = A\), which we term a \emph{trivial} selection, every element of B must appear in A and vice versa. Similarly, every element of \(A\) can be associated with at most one element of \(B\), and vice versa. Trivial selection edges are therefore bijective, i.e., \(A \LabelledEdge{\Bijective}{\RelRestrict} B\). If \(B \subset A\), which we term a \emph{non-trivial} selection, every element of \(B\) must appear in \(A\), but not vice versa. The mapping remains one-to-one, as before. From the perspective of the supertype, non-trivial selection edges are therefore surjective, functional, and injective, i.e., \(A \SelectionEdge B\). -A projection edge \(e : A \LabelledEdge{\sigedge{00}}{\RelProject} B\), indicates that the elements of \(B\) are a projection of \(A\). This implies that \(A\) is a constructed node of the form \(B \times C \times \dotsb\). Trivial projection edges (\(B = A\)) are bijective, i.e., \(A \LabelledEdge{\Bijective}{\RelProject} B\), for the same reasons as trivial selection edges. For non-trivial projections (i.e., \(B \ne A\)), every element of \(A\) must have a corresponding element in \(B\), and vice versa. Similarly, every element of \(A\) can be associated with at most one element of \(B\), but \emph{not} vice versa. Thus, non-trivial projection edges are total, surjective, and functional, i.e., \(A \ProjectionEdge B\).\footnote{Allowing nulls could potentially alter which of these properties apply, but all non-trivial projection edges would still share the same properties regardless.} +A projection edge \(e : A \LabelledEdge{\sigedge{00}}{\RelProject} B\), indicates that the elements of \(B\) are a projection of \(A\). This implies that \(A\) is a constructed node of the form \(B \times C \times \dotsb\). Trivial projection edges (\(B = A\)) are bijective, i.e., \(A \LabelledEdge{\Bijective}{\RelProject} B\), for the same reasons as trivial selection edges. For non-trivial projections (i.e., \(B \ne A\)), every element of \(A\) must have a corresponding element in \(B\), and vice versa. Similarly, every element of \(A\) can be associated with at most one element of \(B\), but not vice versa. Thus, non-trivial projection edges are total, surjective, and functional, i.e., \(A \ProjectionEdge B\).\footnote{Allowing nulls could potentially alter which of these properties apply, but all non-trivial projection edges would still share the same properties regardless.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -372,7 +372,7 @@ \subsection{Comparing SIGs} \label{sec-sig-compare} -\noindent It is possible to determine the relationship between the relative information capacities of two schemas by comparing SIGs for the two schemas. Consider two schemas \(\SC{1}\) and \(\SC{2}\), with corresponding SIGs \(G_{\SC{1}}\) and \(G_{\SC{2}}\). Intuitively, if \(G_{\SC{1}}\) and \(G_{\SC{2}}\) are isomorphic, then \(\SC{1}\) and \(\SC{2}\) are equivalent \cite{Miller.R-1994a-SIG}. An isomorphism occurs between two SIGs when the graphs match both structurally and semantically. That is, \(G_{\SC{1}}\) and \(G_{\SC{2}}\) must be more than just structurally identical---the semantics of the SIG nodes and edges must also be compatible. +\noindent It is possible to determine the relationship between the relative information capacities of two schemas by comparing SIGs for the two schemas. Consider two schemas \(\SC{1}\) and \(\SC{2}\), with corresponding SIGs \(G_{\SC{1}}\) and \(G_{\SC{2}}\). Intuitively, if \(G_{\SC{1}}\) and \(G_{\SC{2}}\) are isomorphic, then \(\SC{1}\) and \(\SC{2}\) are equivalent \cite{Miller.R-1994a-SIG}. \(G_{\SC{1}}\) and \(G_{\SC{2}}\) are considered isomorphic if they have identical structures and the corresponding nodes and edges in both SIGs have compatible semantics. Miller et al.\ \cite{Miller.R-1994a-SIG} identify three possible results from comparing two SIGs \(G_{\SC{1}}\) and \(G_{\SC{2}}\): \begin{enumerate} @@ -389,11 +389,11 @@ Miller et al.\ \cite{Miller.R-1994a-Schema} describe the following equivalence preserving and information capacity augmenting transformations on SIGs, which are summarised in Table~\ref{Tab.Viewpoints.SIGTransformations}: \begin{description} - \item[Annotation transformations] allow the removal of annotations, or projection and selection constraints from edges (e.g., \(n_{1} \sigedge{23} n_{2} \Rightarrow n_{1} \sigedge{02} n_{2}\)). If we can transform \(\SC{1}\) into \(\SC{2}\) by such a transformation, then the information capacity of \(\SC{2}\) dominates that of \(\SC{1}\) (i.e., \Dominates{\SC{2}}{\SC{1}}). + \item[Annotation transformations] allow the removal of annotations or projection and selection constraints from edges (e.g., \(n_{1} \sigedge{23} n_{2} \Rightarrow n_{1} \sigedge{03} n_{2}\)). If we can transform \(\SC{1}\) into \(\SC{2}\) by such a transformation, then the information capacity of \(\SC{2}\) dominates that of \(\SC{1}\) (i.e., \Dominates{\SC{2}}{\SC{1}}). - \item[Composition transformations] allow edges to be moved and copied around the graph. Consider an edge \(e\) between nodes \(n_{1}\) and \(n_{2}\), as shown in Figure~\ref{fig-sig-composition-original}. If there is a path \(P = e_{m} \circ \dotsb \circ e_{k}\) of functional edges from some other node \(n_{k}\) to \(n_{2}\) (i.e., \(n_{k} \Functional \dotsb \Functional n_{2}\)), \(e\) may be copied to a new edge \(e'\) between nodes \(n_{1}\) and \(n_{k}\) as shown in Figure~\ref{fig-sig-composition-copy}. The original \(e\) can then be deleted if desired, effectively moving \(e\) to \(e'\) as shown in Figure~\ref{fig-sig-composition-move}. + \item[Composition transformations] allow edges to be moved and copied around the graph. Consider an edge \(e : n_{1} \sigedge{00} n_{2}\), as shown in Figure~\ref{fig-sig-composition-original}. If there is a path \(P = e_{m} \circ \dotsb \circ e_{k}\) of functional edges from some other node \(n_{k}\) to \(n_{2}\) (i.e., \(n_{k} \Functional \dotsb \Functional n_{2}\)), \(e\) may be copied to a new edge \(e' : n_{1} \sigedge{00} n_{k}\), as shown in Figure~\ref{fig-sig-composition-copy}. The original \(e\) can then be deleted if desired, effectively moving \(e\) to \(e'\) as shown in Figure~\ref{fig-sig-composition-move}. - The annotation of \(e'\) is determined by the composition of the edges of \(P\!\). If an edge with a particular annotation (such as surjectivity) is composed with an edge without that annotation, the resulting edge does not have the annotation. For example, the composition of the edges \(\Functional\) and \(\Surjective\) would be an edge with no annotations (\(\Edge\)), whereas the composition of \(\Functional\) and \(\sigedge{03}\) would be \(\Functional\). If we can transform \(\SC{1}\) into \(\SC{2}\) by such a transformation, then \Dominates{\SC{2}}{\SC{1}}. + The annotation of \(e'\) is determined by the composition of the edges of \(P\!\). If an edge with a particular annotation (such as surjectivity) is composed with an edge without that annotation, the resulting edge also does not have the annotation. For example, the composition of the edges \(\Functional\) and \(\Surjective\) would be an edge with no annotations (\(\Edge\)), whereas the composition of \(\Functional\) and \(\sigedge{03}\) would be \(\Functional\). If we can transform \(\SC{1}\) into \(\SC{2}\) by such a transformation, then \Dominates{\SC{2}}{\SC{1}}. %%%%%%%%%%%%%%%%%%%% @@ -442,7 +442,7 @@ \end{tikzpicture} } - \subfigure[Copy edge \(e\) to \(e'\)\label{fig-sig-composition-copy}]{ + \subfigure[Copy edge \(e\) to \(e'\) across path \(P\)\label{fig-sig-composition-copy}]{ \begin{tikzpicture} \node (n1) {\(n_{1}\)}; \node (n2) [below=1cm of n1] {\(n_{2}\)}; @@ -458,7 +458,7 @@ \end{tikzpicture} } - \subfigure[Move edge \(e\) to \(e'\)\label{fig-sig-composition-move}]{ + \subfigure[Move edge \(e\) to \(e'\) across path \(P\)\label{fig-sig-composition-move}]{ \begin{tikzpicture} \node (n1) {\(n_{1}\)}; \node (n2) [below=1cm of n1] {\(n_{2}\)}; @@ -483,7 +483,7 @@ \item An existing node \(n\) may be duplicated to create a new node \(n'\!\). A trivial selection edge \(e:n \LabelledEdge{\Bijective}{\RelRestrict} n'\) is added to enforce the restriction that \(n'\) is a duplicate of \(n\). - \item A node \(n_{1}\) may be deleted if it is attached to exactly one other node \(n_{2}\) by a single trivial selection edge \(e:n_{1} \LabelledEdge{\Bijective}{\RelRestrict} n_{2}\) (implying \(n_{1}\) is a duplicate of \(n_{2}\)). This also implies deleting the attached trivial selection edge. + \item A node \(n_{1}\) may be deleted if it is associated with exactly one other node \(n_{2}\) by a single trivial selection edge \(e:n_{1} \LabelledEdge{\Bijective}{\RelRestrict} n_{2}\) (implying \(n_{1}\) is a duplicate of \(n_{2}\)). This also implies deleting the trivial selection edge. \item A trivial selection edge \(e:n \LabelledEdge{\Bijective}{\RelRestrict} n\) may be created that connects a node \(n\) to itself. @@ -491,7 +491,7 @@ \end{enumerate} - If we can transform \(\SC{1}\) into \(\SC{2}\) using any of the first three selection transformations, then \Equivalent{\SC{2}}{\SC{1}}. If we can transform \(\SC{1}\) into \(\SC{2}\) using the fourth selection transformation, then \Dominates{\SC{2}}{\SC{1}}. The latter seems somewhat counter-intuitive \cite{Qian.X-1995a-Correct}, but is nonetheless correct \cite{Miller.R-1994a-Schema}. This is because a trivial selection edge \(e:n_{1} \LabelledEdge{\Bijective}{\RelRestrict} n_{2}\) denotes that \(n_{1}\) and \(n_{2}\) correspond to the same domain, so removing the edge also removes this constraint. + If we can transform \(\SC{1}\) into \(\SC{2}\) using any of the first three selection transformations, then \Equivalent{\SC{2}}{\SC{1}}. If we can transform \(\SC{1}\) into \(\SC{2}\) using the fourth selection transformation, then \Dominates{\SC{2}}{\SC{1}}. The latter seems somewhat counter-intuitive \cite{Qian.X-1995a-Correct}, but is nonetheless correct \cite{Miller.R-1994a-Schema}. This is because a trivial selection edge \(e:n_{1} \LabelledEdge{\Bijective}{\RelRestrict} n_{2}\) denotes that \(n_{1}\) and \(n_{2}\) correspond to the same domain, so removing the edge also removes this constraint. The information capacity of \(\SC{2}\) must therefore be greater than \(\SC{1}\). \end{description}