\documentclass{llncs} \input{epsf} % %\usepackage{makeidx} % allows for indexgeneration % \begin{document} % \title{A Shape Metric for Evolving Time Series Models} \titlerunning{Fitness Functions and Time Series} \author{P.A. Whigham\inst{1} \and C.H. Aldridge\inst{2}} \authorrunning{P.A. Whigham} % abbreviated author list (for running head) %%% modified list of authors for the TOC (add the affiliations) \tocauthor{Peter A. Whigham} \institute{Department of Information Science, University of Otago, New Zealand,\\ \email{pwhigham@infoscience.otago.ac.nz},\\ \texttt{http://divcom.otago.ac.nz/sirc/Peterw/} \and Department of Information Science, University of Otago, New Zealand,\\ \email{caldridge@infoscience.otago.ac.nz},\\ \texttt{http://divcom.otago.ac.nz/infosci/Staff/ColinA.htm}} \maketitle % typeset the title of the contribution \pagestyle{plain} \thispagestyle{plain} \begin{abstract} Most applications of Genetic Programming to time series modeling use a fitness measure for comparing potential solutions that treat each point in the time series independently. This non-temporal approach can lead to some potential solutions being given a relatively high fitness measure even though they do not correspond to the training data when the overall shape of the series is taken into account. This paper develops two fitness measures which emphasize the concept of shape when measuring the similarity between a training and evolved time series. One approach extends the root mean square error to higher dimensional derivatives of the series. The second approach uses a simplified derivative concept that describes shape in terms of positive, negative and zero slope. \end{abstract} \bibliographystyle{apalike} \section{Introduction} Multivariate time series data exists in many domains, including business, environmental, medical and engineering. Machine learning techniques have often been applied to these forms of data, either to explore the underlying properties of the described system or to produce models for forecasting. Population-based search methods, such as genetic algorithms (GAs), genetic programming (GP) and evolving neural networks have been successfully applied to a variety of time series tasks due to their ability to handle complex, non-linear problems \cite{zhang001,GYLee,Whighamaigp99}. Evolutionary learning systems are fundamentally driven by the use of a fitness measure that biases the selection of individuals for reproduction. The metaphor of a fitness landscape \cite{eldredgeevol,jones_fitness95}, originally defined by evolutionary biologists, has been used to describe how the fitness function changes the shape and difficulty for searching through this landscape\cite{Manderick91}. Although it is clear that the appropriate selection of a fitness function is essential for the successful application of evolutionary learning techniques, little research has been done on determining the appropriate measures to be used for time series applications. This paper is organised as follows. Section \ref{mot} presents the motivation for considering a revised form of fitness for time series analysis. Section \ref{timetemp} commences by reviewing several standard measures of error used when evolving time series predictions. Two fitness measures that account for shape are then developed. Section \ref{refdisc} briefly gives some discussion and future work based on this preliminary study. Section \ref{conc} summarises the basic conclusions regarding these measures. \section{Motivation}\label{mot} In relation to time series, the fitness measure is intended to reflect how similar each member of the population is to the training data. The fitness measure allows a probabilistic selection of the population members for reproduction such that the fitter individuals are more likely to propagate into future generations. The field of data mining also uses the concept of similarity in relation to the mining of large temporal databases (see \cite{Tanselltimedm} for examples). One goal of data mining is to extract objects from a database which have similar or dissimilar properties from some defined object. In relation to time series this translates into finding sequences that exhibit similar behaviour for a large subset of their length. Additional problems are often encountered in temporal data mining due to scaling, translation and outliers that make the problem of measuring similarity more difficult. This problem has been addressed by Keogh and Pazzani \cite{KeoghTimeSeries} where they argue that similarity is a subjective measure not easily captured by a simple metric based on treating each point in a time series independently. \begin{figure} \center \epsfysize=205pt \epsfbox{avdiffeqn.eps} \caption{The Straight Line (Average) has a lower RMSE than the Predicted curve.} \label{rmsefail} \end{figure} This paper will argue that point-based metrics are not always the appropriate fitness measure when using evolutionary methods for developing models of time series behaviour because they ignore global information in terms of the form or structure of the complete time series. This lack of global information may bias the population evolution in such a way that intermediate structures that do capture the underlying patterns in the data have low fitness and therefore do not propagate. For example, Fig. \ref{rmsefail} shows a time series (labeled Actual) and two other time series; one which appears quite similar in shape (Predicted) and the other (Average) which is a straight line representing the average of the Actual series. The Average series has a lower root mean square error (RMSE) than the Predicted sequence, whereas it seems clear from observation that this is not the correct ordering of similarity between the sequences. Further examples showing the unintuitive measure of similarity based on RMSE are given in \cite{KeoghTimeSeries}. Consider the consequences of this mismatch when a population is evolving; a predicted sequence that has the correct shape but some outliers or scaling mismatch will not be selected even though it clearly has many of the features of the sequence being modeled. Given that the driving force behind evolutionary learning systems is selection presure based on the fitness measure, an inappropriate metric will bias the search away from useful partial solutions. \section{What makes a time series temporal?}\label{timetemp} By its very nature a time series has an explicit ordering. This ordering extends the possible relationships between elements that do not exist with other (non-temporal) tables of data. It is this ordering which seems to be overlooked when measures of fitness(similarity) are considered with evolutionary systems. This can lead to population members being given inappropriate comparative fitness measures which in turn influence the future population dynamics and, ultimately, the resulting solution. \subsection{Non-Temporal Fitness Measures} This section describes several time series fitness measures often used in assessing forecasting techniques \cite{makridakisforecasting} and with GP approaches to time series prediction. The measures defined all have the property that they equal zero when there is a perfect fit between the predicted and actual series. The main disadvantage with each measure is that no account is taken of the inherent temporal nature of the series. Essentially the temporal ordering is discarded and the series is treated as a table of unrelated data items. There are many standard measures of time series error, however they are all basic variations of the following. Let $P_{t}$ be the predicted value at time $t$. Let $A_{t}$ be the actual value at time $t$. Let $n$ be the number of data points in the time series. Let $\bar{A}$ be the mean of the actual values. \begin{equation}\label{EQNRMSE} RMSE(A,P) = \sqrt{\frac{\sum_{t=1}^n (P_{t}-A_{t})^2}{n}} \end{equation} \begin{equation}\label{EQNMAPE} MAPE(A,P) = \frac{\sum_{t=1}^n \mid(P_{t}-A_{t})\div A_{t}\mid}{n} \end{equation} \begin{equation}\label{EQNCOE} COE(A,P) = \frac{\sum_{t=1}^n (P_{t}-A_{t})^2}{\sum_{t=1}^n (A_{t}-\bar{A})^2} \end{equation} The root mean square error (RMSE) is commonly used as a fitness measure when evolving time series models. RMSE tends to be used because large errors between points in the predicted and actual series are accentuated (due to squaring). This tends to produce partial solutions that model the larger values in the time series. The mean absolute percentage error (MAPE) is a relative measure and is often used for comparison between different time series. MAPE gives more emphasis towards smaller values in the sequence since the absolute error between the predicted and actual values in the time series is divided by the actual value. The coefficient of equation (COE) is similar to the RMSE however it is scaled based on the deviation of each value about the mean. COE also emphasises large values in the time sequence due to the squared error term between the actual and predicted points in the series. \subsection{Temporal Fitness Measures} The previous measures treat the time series as a table of unrelated data items. They do not consider the importance of neighboring values in the sequence. To explicitly take advantage of the temporal nature of time series the error metric must include information that is represented by the 'shape' of the time sequence. Differencing, a type of filter often used for removing trends in a time series, is an appropriate starting point for incorporating temporal relationships into an error (fitness) metric. For a time series $\{x_{1},\ldots,x_{n}\}$ a new series $\{y_{1},\ldots,y_{n-1}\}$ is formed using the difference operator \cite{chatfieldtime}: \begin{equation}\label{diff1} y_{t} = x_{t+1}-x_{t} = \nabla x_{t+1} \end{equation} Equation \ref{diff1} represents the absolute change in consecutive time steps; in other words, it is the first derivative of the time series curve for consecutive time steps. The second difference is defined as: \begin{equation}\label{diff2} \nabla^2 x_{t+2} = \nabla x_{t+2}-\nabla x_{t+1}=x_{t+2}-2x_{t+1}+x_{t} \end{equation} The concept of differencing can now be extended to develop a similarity measure $F(A,P)$ between two time series $A$ and $P$. So that the measure is appropriate for use with a GP system two properties are required; firstly, $F(A,P)\geq 0$ and, secondly, $F(A,P)=0$ when the sequences $A$ and $P$ are identical. The fitness measure, $F(A,P)$, will be based on the non-temporal RMSE (Eqn. \ref{EQNRMSE}) and extended using k-th order differencing. Measures based on MAPE or COE could also be developed in a similar manner. Let the predicted and actual sequences be represented as $\{P_1,P_2,\ldots,P_n\}$ and $\{A_1,A_2,\ldots,A_n\}$. Let $P_{t}$ and $A_{t}$ be the predicted and actual values at time $t$. Define the zeroth order difference between sequences $P$ and $A$ at time $t$ as: \begin{equation}\label{zerok} \nabla^0(A,P)_{t}=A_{t}-P_{t} \end{equation} The first order difference between $P$ and $A$ between the points $t$ and $t+1$ can now be defined as: \begin{equation}\label{firstk} \nabla^1(A,P)_{t}=\nabla^0(A,P)_{t+1}-\nabla^0(A,P)_{t} \end{equation} Eqn. \ref{firstk} represents the difference between the slopes of the actual and predicted curves at $t$ and $t+1$, shown by expanding Eqn. \ref{firstk} as follows: \begin{equation}\label{onek} \nabla^1(A,P)_{t}=(A_{t+1}-P_{t+1})-(A_{t}-P_{t}) =(A_{t+1}-A_{t})-(P_{t+1}-P_{t}) \end{equation} In general, the $kth$ order difference between $A$ and $P$ at time $t$ may be stated as: \begin{equation}\label{kk} \nabla^k(A,P)_{t}=\nabla^{k-1}(A,P)_{t+k} - \nabla^{k-1}(A,P)_{t+k-1} \ \ \ \ \ 1 \leq k \leq n-1 \end{equation} The previous difference equations can now be extended to incorporate the RMSE format. The RMSE for $k=0$ is defined as: \begin{equation}\label{RMSE0} RMSE^0(A,P) = \sqrt{\frac{\sum_{t=1}^n \nabla^0(A,P)^2_t}{n}} \end{equation} The RMSE for the $kth$ difference is defined as: \begin{equation}\label{RMSEk} RMSE^k(A,P) = \sqrt{\frac{\sum_{t=1}^{n-k} \nabla^k(A,P)^2_t}{n-k}} \end{equation} A fitness function $F^w(A,P)$, which measures the similarity between the time series $\{A_1,A_2,\ldots,A_n \}$ and $\{P_1,P_2,\ldots,P_n \}$ using a differencing window of length $w$, can now be defined as: \begin{equation}\label{F1} F^w(A,P)=\sqrt{\sum_{k=0}^{w}\sum_{t=1}^{n-k}\frac{\nabla^k(A,P)^2_t}{n-k}} \ \ \ \ \ \ \ \ 0 \leq w \leq n-1 \end{equation} Note that $F^0(A,P)$ is equal to the original RMSE definition of Eqn. \ref{EQNRMSE}. Allowing $w$ to increase produces a measure that incorporates more information about the shape of each sequence and therefore gives a more unique and informative measure of similarity between $A$ and $P$. \begin{figure} \center \epsfysize=205pt \epsfbox{w020.eps} \caption{$F^w(A,P)$ for predicted and straight line (average) series} \label{w020} \end{figure} \subsection{Problems associated with $F^w(A,P)$} The original motivation for this work came about by recognising the inappropriate ordering of "fit" time series when using RMSE. This is clearly shown in Fig. \ref{rmsefail}. To explore whether $F^w(A,P)$ can offer a more appropriate similarity measure the fitness measure was calculated for the sequences shown in Fig. \ref{rmsefail} for $0 \leq w < 20$. The resulting error for increasing $w$ is shown in Fig. \ref{w020}. There are several concerns with these results. One is the computational cost when calculating $F^w(A,P)$. The second point is that $F^w(A,P)$ rapidly rises for increasing $w$. Finally, Fig. \ref{w020} shows that the straight line continues to have a lower overall error then the predicted curve. Hence, although the notion of shape has been explicitly captured by $F^w(A,P)$, the squaring of the difference for each $\nabla^k(A,P)$ is rapidly dominated by one or more errors caused by a poor value for the predicted series. \subsection{An alternative $F^w(A,P)$ based on slope direction} The goal of this work was to develop a fitness measure that allowed global information about shape to be incorporated. Simplifying the error measurement can be achieved by classifying each value of the $kth$ difference into one of three values; positive slope, negative slope and zero slope. The error for each difference is then the number of times that the sign of the difference does not match. This value is normalised by the number of points in the $kth$ sequence to give a value between zero and one. A perfect match with shape for the $kth$ difference will equal zero, while a complete mismatch with shape will equal one. Define this error measurement for the $kth$ difference as $\triangle^k(A,P)$. Fig. \ref{normalisedk} shows the normalised shape error for increasing $k$ for the predicted and average curves related to the actual curve of Fig. \ref{rmsefail}. Note that the average prediction (straight line) is now penalised due to its poor correlation with the slope of the actual curve. The predicted curve, that by observation appears to capture many of the shape aspects of the actual curve, has a lower error for most of the $k$ differences. The main objection to this approach is that it appears rather ad hoc compared with the original definition of error. For example, how should this error be combined with the original RMSE? Since the fitness measure should only go to zero when the two time series are identical one possible measure would be: \begin{equation}\label{shapermse} S^w(A,P)=F^0(A,P) + F^0(A,P)* \frac{1}{w}\sum_{k=1}^w \Delta^k(A,P) \ \ \ \ 0 < w \leq n-1 \end{equation} Using Eqn. \ref{shapermse} the error $S^{n-1}(A,P)$ for the predicted curve is $43.9$ versus the average (straight line) error of $67.5$. \begin{figure} \center \epsfysize=205pt \epsfbox{normalisedk.eps} \caption{Normalised error in slope for predicted and straight line (average) series} \label{normalisedk} \end{figure} \section{Discussion}\label{refdisc} This paper has presented two alternative approaches to incorporating shape knowledge into a fitness measure when evolving time series models. $F^w(A,P)$ has been shown to be too complex and produces fitness values that are too large when $w$ approaches the total length of the series. The simplified shape estimate, $S^w(A,P)$, appears to have some advantages in that it is less computationally demanding and produces values that are more amenable to use as a fitness measure. This preliminary work indicates several lines of possible research. Based on Fig. \ref{normalisedk} it is worth determining what properties of the time series are causing the predicted shape error to level out to approximately $0.30$ when $k > 90$. This may have some relationship to seasonal patterns in the data, or to some high order correlations in the data. At present, work is being undertaken to use $S^w(A,P)$ to evolve time series models for varying values of $w$. The results of this work are expected to confirm that using a shape metric as part of the fitness measure will allow more accurate time series models to be developed. \section{Conclusion}\label{conc} This paper presents two fitness measures for comparing time series predictions. Problems associated with the calculation of $F^w(A,P)$ for large $w$ led to a second metric that simplified the concept of shape to a positive, negative or zero slope measurement. This measure $S^w(A,P)$ has been shown to produce relative errors between sequences that align more with our intuitive notion of similarity. The obvious difficulty with this work is proving that $F^w(A,P)$ or $S^w(A,P)$ for $w > 0$ will in general help evolve better solutions then the original non-temporal error measurements. Each new time series will have different properties and be more suited to one of the temporal or non-temporal measures that have been presented. The most persuasive evidence of an improvement by using shape for fitness would be to evolve a comprehensively better model for a well-known problem than previously discovered. It is recommended that the fitness measures introduced here be applied by researchers in time series modelling and GP to ascertain whether this is the case. \bibliography{main} \end{document}