Discussion_Papers/Papers/2000/2000-10/Source/dp2000-10.tex at ed025f923b1c474f775b1202505ca645c03c9ebd

nigel.stanger / Discussion_Papers
Find file
Newer
Older
Discussion_Papers / Papers / 2000 / 2000-10 / Source / dp2000-10.tex
Nigel Stanger on 6 May 2013 18 KB - Added .gitignore.
Raw Blame History
\documentclass{llncs}
\input{epsf}
%
%\usepackage{makeidx}  % allows for indexgeneration
%
\begin{document}
%
\title{A Shape Metric for Evolving Time Series Models}

\titlerunning{Fitness Functions and Time Series}

\author{P.A. Whigham\inst{1} \and C.H. Aldridge\inst{2}}

\authorrunning{P.A. Whigham}   % abbreviated author list (for running head)

%%% modified list of authors for the TOC (add the affiliations)
\tocauthor{Peter A. Whigham}

\institute{Department of Information Science, University of Otago,
New Zealand,\\ \email{pwhigham@infoscience.otago.ac.nz},\\
\texttt{http://divcom.otago.ac.nz/sirc/Peterw/} \and Department of
Information Science, University of Otago, New Zealand,\\
\email{caldridge@infoscience.otago.ac.nz},\\
\texttt{http://divcom.otago.ac.nz/infosci/Staff/ColinA.htm}}

\maketitle              % typeset the title of the contribution

\pagestyle{plain}
\thispagestyle{plain}

\begin{abstract}
Most applications of Genetic Programming to time series modeling
use a fitness measure for comparing potential solutions that treat
each point in the time series independently.  This non-temporal
approach can lead to some potential solutions being given a
relatively high fitness measure even though they do not correspond
to the training data when the overall shape of the series is taken
into account.  This paper develops two fitness measures which
emphasize the concept of shape when measuring the similarity
between a training and evolved time series.  One approach extends
the root mean square error to higher dimensional derivatives of
the series.  The second approach uses a simplified derivative
concept that describes shape in terms of positive, negative and
zero slope.
\end{abstract}
\bibliographystyle{apalike}
\section{Introduction}

Multivariate time series data exists in many domains, including
business, environmental, medical and engineering. Machine learning
techniques have often been applied to these forms of data, either
to explore the underlying properties of the described system or to
produce models for forecasting.  Population-based search methods,
such as genetic algorithms (GAs), genetic programming (GP) and
evolving neural networks have been successfully applied to a
variety of time series tasks due to their ability to handle
complex, non-linear problems \cite{zhang001,GYLee,Whighamaigp99}.

Evolutionary learning systems are fundamentally driven by the use
of a fitness measure that biases the selection of individuals for
reproduction. The metaphor of a fitness landscape
\cite{eldredgeevol,jones_fitness95}, originally defined by
evolutionary biologists, has been used to describe how the fitness
function changes the shape and difficulty for searching through
this landscape\cite{Manderick91}.  Although it is clear that the
appropriate selection of a fitness function is essential for the
successful application of evolutionary learning techniques, little
research has been done on determining the appropriate measures to
be used for time series applications.

This paper is organised as follows.  Section \ref{mot} presents
the motivation for considering a revised form of fitness for time
series analysis.  Section \ref{timetemp} commences by reviewing
several standard measures of error used when evolving time series
predictions.  Two fitness measures that account for shape are then
developed. Section \ref{refdisc} briefly gives some discussion and
future work based on this preliminary study.  Section \ref{conc}
summarises the basic conclusions regarding these measures.

\section{Motivation}\label{mot}

In relation to time series, the fitness measure is intended to
reflect how similar each member of the population is to the
training data. The fitness measure allows a probabilistic
selection of the population members for reproduction such that the
fitter individuals are more likely to propagate into future
generations. The field of data mining also uses the concept of
similarity in relation to the mining of large temporal databases
(see \cite{Tanselltimedm} for examples). One goal of data mining
is to extract objects from a database which have similar or
dissimilar properties from some defined object. In relation to
time series this translates into finding sequences that exhibit
similar behaviour for a large subset of their length. Additional
problems are often encountered in temporal data mining due to
scaling, translation and outliers that make the problem of
measuring similarity more difficult. This problem has been
addressed by Keogh and Pazzani \cite{KeoghTimeSeries} where they
argue that similarity is a subjective measure not easily captured
by a simple metric based on treating each point in a time series
independently.

\begin{figure} \center
  \epsfysize=205pt
  \epsfbox{avdiffeqn.eps}
  \caption{The Straight Line (Average) has a lower RMSE than the Predicted curve.}
  \label{rmsefail}
\end{figure}


This paper will argue that point-based metrics are not always the
appropriate fitness measure when using evolutionary methods for
developing models of time series behaviour because they ignore
global information in terms of the form or structure of the
complete time series.  This lack of global information may bias
the population evolution in such a way that intermediate
structures that do capture the underlying patterns in the data
have low fitness and therefore do not propagate.  For example,
Fig. \ref{rmsefail} shows a time series (labeled Actual) and two
other time series; one which appears quite similar in shape
(Predicted) and the other (Average) which is a straight line
representing the average of the Actual series. The Average series
has a lower root mean square error (RMSE) than the Predicted
sequence, whereas it seems clear from observation that this is not
the correct ordering of similarity between the sequences. Further
examples showing the unintuitive measure of similarity based on
RMSE are given in \cite{KeoghTimeSeries}.

Consider the consequences of this mismatch when a population is
evolving; a predicted sequence that has the correct shape but some
outliers or scaling mismatch will not be selected even though it
clearly has many of the features of the sequence being modeled.
Given that the driving force behind evolutionary learning systems
is selection presure based on the fitness measure, an
inappropriate metric will bias the search away from useful partial
solutions.

\section{What makes a time series temporal?}\label{timetemp}

By its very nature a time series has an explicit ordering. This
ordering extends the possible relationships between elements that
do not exist with other (non-temporal) tables of data.  It is this
ordering which seems to be overlooked when measures of
fitness(similarity) are considered with evolutionary systems. This
can lead to population members being given inappropriate
comparative fitness measures which in turn influence the future
population dynamics and, ultimately, the resulting solution.

\subsection{Non-Temporal Fitness Measures}

This section describes several time series fitness measures often
used in assessing forecasting techniques
\cite{makridakisforecasting} and with GP approaches to time series
prediction. The measures defined all have the property that they
equal zero when there is a perfect fit between the predicted and
actual series. The main disadvantage with each measure is that no
account is taken of the inherent temporal nature of the series.
Essentially the temporal ordering is discarded and the series is
treated as a table of unrelated data items. There are many
standard measures of time series error, however they are all basic
variations of the following. Let $P_{t}$ be the predicted value at
time $t$. Let $A_{t}$ be the actual value at time $t$. Let $n$ be
the number of data points in the time series. Let $\bar{A}$ be the
mean of the actual values.

\begin{equation}\label{EQNRMSE}
  RMSE(A,P) = \sqrt{\frac{\sum_{t=1}^n (P_{t}-A_{t})^2}{n}}
\end{equation}

\begin{equation}\label{EQNMAPE}
MAPE(A,P) = \frac{\sum_{t=1}^n \mid(P_{t}-A_{t})\div A_{t}\mid}{n}
\end{equation}

\begin{equation}\label{EQNCOE}
  COE(A,P) = \frac{\sum_{t=1}^n (P_{t}-A_{t})^2}{\sum_{t=1}^n (A_{t}-\bar{A})^2}
\end{equation}

The root mean square error (RMSE) is commonly used as a fitness
measure when evolving time series models. RMSE tends to be used
because large errors between points in the predicted and actual
series are accentuated (due to squaring). This tends to produce
partial solutions that model the larger values in the time series.

The mean absolute percentage error (MAPE) is a relative measure
and is often used for comparison between different time series.
MAPE gives more emphasis towards smaller values in the sequence
since the absolute error between the predicted and actual values
in the time series is divided by the actual value.

The coefficient of equation (COE) is similar to the RMSE however
it is scaled based on the deviation of each value about the mean.
COE also emphasises large values in the time sequence due to the
squared error term between the actual and predicted points in the
series.

\subsection{Temporal Fitness Measures}

The previous measures treat the time series as a table of
unrelated data items. They do not consider the importance of
neighboring values in the sequence. To explicitly take advantage
of the temporal nature of time series the error metric must
include information that is represented by the 'shape' of the time
sequence. Differencing, a type of filter often used for removing
trends in a time series, is an appropriate starting point for
incorporating temporal relationships into an error (fitness)
metric.  For a time series $\{x_{1},\ldots,x_{n}\}$ a new series
$\{y_{1},\ldots,y_{n-1}\}$ is formed using the difference operator
\cite{chatfieldtime}:
\begin{equation}\label{diff1}
y_{t} = x_{t+1}-x_{t} = \nabla x_{t+1}
\end{equation}
Equation \ref{diff1} represents the absolute change in consecutive
time steps; in other words, it is the first derivative of the time
series curve for consecutive time steps. The second difference is
defined as:
\begin{equation}\label{diff2}
\nabla^2 x_{t+2} = \nabla x_{t+2}-\nabla
x_{t+1}=x_{t+2}-2x_{t+1}+x_{t}
\end{equation}
The concept of differencing can now be extended to develop a
similarity measure $F(A,P)$ between two time series $A$ and $P$.
So that the measure is appropriate for use with a GP system two
properties are required; firstly, $F(A,P)\geq 0$ and, secondly,
$F(A,P)=0$ when the sequences $A$ and $P$ are identical. The
fitness measure, $F(A,P)$, will be based on the non-temporal RMSE
(Eqn. \ref{EQNRMSE}) and extended using k-th order differencing.
Measures based on MAPE or COE could also be developed in a similar
manner.

Let the predicted and actual sequences be represented as
$\{P_1,P_2,\ldots,P_n\}$ and $\{A_1,A_2,\ldots,A_n\}$. Let $P_{t}$
and $A_{t}$ be the predicted and actual values at time $t$. Define
the zeroth order difference between sequences $P$ and $A$ at time
$t$ as:
\begin{equation}\label{zerok}
\nabla^0(A,P)_{t}=A_{t}-P_{t}
\end{equation}
The first order difference between $P$ and $A$ between the points
$t$ and $t+1$ can now be defined as:
\begin{equation}\label{firstk}
\nabla^1(A,P)_{t}=\nabla^0(A,P)_{t+1}-\nabla^0(A,P)_{t}
\end{equation}
Eqn. \ref{firstk} represents the difference between the slopes of
the actual and predicted curves at $t$ and $t+1$, shown by
expanding Eqn. \ref{firstk} as follows:
\begin{equation}\label{onek}
\nabla^1(A,P)_{t}=(A_{t+1}-P_{t+1})-(A_{t}-P_{t})
                 =(A_{t+1}-A_{t})-(P_{t+1}-P_{t})
\end{equation}
In general, the $kth$ order difference between $A$ and $P$ at time
$t$ may be stated as:
\begin{equation}\label{kk}
\nabla^k(A,P)_{t}=\nabla^{k-1}(A,P)_{t+k} -
\nabla^{k-1}(A,P)_{t+k-1} \ \ \ \ \ 1 \leq k \leq n-1
\end{equation}
The previous difference equations can now be extended to
incorporate the RMSE format.  The RMSE for $k=0$ is defined as:
\begin{equation}\label{RMSE0}
RMSE^0(A,P) = \sqrt{\frac{\sum_{t=1}^n \nabla^0(A,P)^2_t}{n}}
\end{equation}
The RMSE for the $kth$ difference is defined as:
\begin{equation}\label{RMSEk}
RMSE^k(A,P) = \sqrt{\frac{\sum_{t=1}^{n-k}
\nabla^k(A,P)^2_t}{n-k}}
\end{equation}
A fitness function $F^w(A,P)$, which measures the similarity
between the time series $\{A_1,A_2,\ldots,A_n \}$ and
$\{P_1,P_2,\ldots,P_n \}$ using a differencing window of length
$w$, can now be defined as:
\begin{equation}\label{F1}
F^w(A,P)=\sqrt{\sum_{k=0}^{w}\sum_{t=1}^{n-k}\frac{\nabla^k(A,P)^2_t}{n-k}}
\ \ \ \ \ \ \ \ 0 \leq w \leq n-1
\end{equation}
Note that $F^0(A,P)$ is equal to the original RMSE definition of
Eqn. \ref{EQNRMSE}. Allowing $w$ to increase produces a measure
that incorporates more information about the shape of each
sequence and therefore gives a more unique and informative measure
of similarity between $A$ and $P$.

\begin{figure} \center
  \epsfysize=205pt
  \epsfbox{w020.eps}
  \caption{$F^w(A,P)$ for predicted and straight line (average) series}
  \label{w020}
\end{figure}


\subsection{Problems associated with $F^w(A,P)$}

The original motivation for this work came about by recognising
the inappropriate ordering of "fit" time series when using RMSE.
This is clearly shown in Fig. \ref{rmsefail}.  To explore whether
$F^w(A,P)$ can offer a more appropriate similarity measure the
fitness measure was calculated for the sequences shown in Fig.
\ref{rmsefail} for $0 \leq w < 20$. The resulting error for
increasing $w$ is shown in Fig. \ref{w020}. There are several
concerns with these results. One is the computational cost when
calculating $F^w(A,P)$. The second point is that $F^w(A,P)$
rapidly rises for increasing $w$. Finally, Fig. \ref{w020} shows
that the straight line continues to have a lower overall error
then the predicted curve.  Hence, although the notion of shape has
been explicitly captured by $F^w(A,P)$, the squaring of the
difference for each $\nabla^k(A,P)$ is rapidly dominated by one or
more errors caused by a poor value for the predicted series.


\subsection{An alternative $F^w(A,P)$ based on slope direction}

The goal of this work was to develop a fitness measure that
allowed global information about shape to be incorporated.
Simplifying the error measurement can be achieved by classifying
each value of the $kth$ difference into one of three values;
positive slope, negative slope and zero slope.  The error for each
difference is then the number of times that the sign of the
difference does not match.  This value is normalised by the number
of points in the $kth$ sequence to give a value between zero and
one.  A perfect match with shape for the $kth$ difference will
equal zero, while a complete mismatch with shape will equal one.
Define this error measurement for the $kth$ difference as
$\triangle^k(A,P)$.  Fig. \ref{normalisedk} shows the normalised
shape error for increasing $k$ for the predicted and average
curves related to the actual curve of Fig. \ref{rmsefail}.  Note
that the average prediction (straight line) is now penalised due
to its poor correlation with the slope of the actual curve.  The
predicted curve, that by observation appears to capture many of
the shape aspects of the actual curve, has a lower error for most
of the $k$ differences.

The main objection to this approach is that it appears rather ad
hoc compared with the original definition of error.  For example,
how should this error be combined with the original RMSE?  Since
the fitness measure should only go to zero when the two time
series are identical one possible measure would be:
\begin{equation}\label{shapermse}
S^w(A,P)=F^0(A,P) + F^0(A,P)* \frac{1}{w}\sum_{k=1}^w
\Delta^k(A,P) \ \ \ \ 0 < w \leq n-1
\end{equation}
Using Eqn. \ref{shapermse} the error $S^{n-1}(A,P)$ for the
predicted curve is $43.9$ versus the average (straight line) error
of $67.5$.

\begin{figure} \center
  \epsfysize=205pt
  \epsfbox{normalisedk.eps}
  \caption{Normalised error in slope for predicted and straight line (average) series}
  \label{normalisedk}
\end{figure}

\section{Discussion}\label{refdisc}
This paper has presented two alternative approaches to
incorporating shape knowledge into a fitness measure when evolving
time series models.  $F^w(A,P)$ has been shown to be too complex
and produces fitness values that are too large when $w$ approaches
the total length of the series.  The simplified shape estimate,
$S^w(A,P)$, appears to have some advantages in that it is less
computationally demanding and produces values that are more
amenable to use as a fitness measure.

This preliminary work indicates several lines of possible
research.  Based on Fig. \ref{normalisedk} it is worth determining
what properties of the time series are causing the predicted shape
error to level out to approximately $0.30$ when $k > 90$.  This
may have some relationship to seasonal patterns in the data, or to
some high order correlations in the data.  At present, work is
being undertaken to use $S^w(A,P)$ to evolve time series models
for varying values of $w$.  The results of this work are expected
to confirm that using a shape metric as part of the fitness
measure will allow more accurate time series models to be
developed.

\section{Conclusion}\label{conc}
This paper presents two fitness measures for comparing time series
predictions. Problems associated with the calculation of
$F^w(A,P)$ for large $w$ led to a second metric that simplified
the concept of shape to a positive, negative or zero slope
measurement. This measure $S^w(A,P)$ has been shown to produce
relative errors between sequences that align more with our
intuitive notion of similarity. The obvious difficulty with this
work is proving that $F^w(A,P)$ or $S^w(A,P)$ for $w > 0$ will in
general help evolve better solutions then the original
non-temporal error measurements. Each new time series will have
different properties and be more suited to one of the temporal or
non-temporal measures that have been presented. The most
persuasive evidence of an improvement by using shape for fitness
would be to evolve a comprehensively better model for a well-known
problem than previously discovered. It is recommended that the
fitness measures introduced here be applied by researchers in time
series modelling and GP to ascertain whether this is the case.


\bibliography{main}
\end{document}