# 4.7. time series¶

## 4.7.1. high-level¶

### 4.7.1.1. basics¶

• usually assume points are equally spaced

• modeling - for understanding underlying process or predicting

• noise, seasonality (regular / predictable fluctuations), trend, cycle

• multiplicative models: time series = trend * seasonality * noise

• additive model: time series = trend + seasonality + noise

• stationarity - mean, variance, and autocorrelation structure do not change over time

• endogenous variable = x = independent variable

• exogenous variable = y = dependent variable

### 4.7.1.3. high-level modelling¶

• common methods

• decomposition - identify each of these components given a time-series

• ex. loess, exponential smoothing

• frequency-based methods - e.g. look at spectral plot

• (AR) autoregressive models - linear regression of current value of one or more prior values of the series

• (MA) moving-average models - require fitting the noise terms

• (ARMA) box-jenkins approach

• moving averages

• simple moving average - just average over a window

• cumulative moving average - mean is calculated using previous mean

• exponential moving average - exponentially weights up more recent points

• prediction (forecasting) models

• autoregressive integrated moving average (arima)

• assumptions: stationary model

## 4.7.2. similarity measures¶

• lock-step measures (Euclidean distance, or any norm)

• can resample to make them same length

• feature-based measures (Fourier coefficients)

• euclidean distance over all coefs is same as over time-series, but we usually filter out high-freq coefs

• can also use wavelets

• model-based measures (auto-regressive)

• compare coefs of an AR (or ARMA) model

• elastic measures

• dynamic time warping = DTW - optimallt aligns in temporal domaub ti nubunuze accumulated cost

• can also enforce some local window around points

• Every index from the first sequence must be matched with one or more indices from the other sequence and vice versa

• The first index from the first sequence must be matched with the first index from the other sequence (but it does not have to be its only match)

• The last index from the first sequence must be matched with the last index from the other sequence (but it does not have to be its only match)

• The mapping of the indices from the first sequence to indices from the other sequence must be monotonically increasing, and vice versa, i.e. if j > i are indices from the first sequence, then there must not be two indices l > k in the other sequence, such that index i is matched with index l and index j is matched with index k , and vice versa

• edit distance EDR

• time-warped edit distance - TWED

• minimum jump cost - MJC

## 4.7.3. book1 (A course in Time Series Analysis) + book2 (Intro to Time Series and Forecasting)¶

### 4.7.3.1. ch 1¶

• when errors are dependent, very hard to distinguish noise from signal

• usually in time-series analysis, we begin by de-trending the data and analyzing the residuals

• ex. assume linear trend or quadratic trend and subtract that fit (or could include sin / cos for seasonal behavior)

• ex. look at the differences instead of the points (nth order difference removes nth order polynomial trend). However, taking differences can introduce dependencies in the data

• ex. remove trend using sliding window (maybe with exponential weighting)

• periodogram - in FFT, this looks at the magnitude of the coefficients (but loses the phase information)

### 4.7.3.2. ch 2 - stationary time series¶

• in time series, we never get iid data

• ex. the process has a constant mean (a type of stationarity)

• ex. the dependencies in the time-series are short-term

• autocorrelation plots: plot correlation of series vs series offset by different lags

• formal definitions of stationarity for time series $$\{X_t\}$$

• strict stationarity - the distribution is the same across time

• second-order / weak stationarity - mean is constant for all t and, for any t and k, the covariance between $$X_t$$ and $$X_{t+k}$$ only depends on the lag difference k

• In other words, there exists a function $$c: \mathbb Z \to \mathbb R$$ such that for all t and k we have $$c(k) = \text{cov} (X_t, X_{t+k})$$

• strict stationary and $$E|X_T^2| < \infty \implies$$ second-order stationary

• ergodic - stronger condition, says samples approach the expectation of functions on the time series: for any function $$g$$ and shift $$\tau_1, ... \tau_k$$:

• $$\frac 1 n \sum_t g(X_t, ... X_{t+\tau_k}) \to \mathbb E [g(X_0, ..., X_{t+\tau_k} )]$$

• causal - can predict given only past values (for Gaussian processes no difference)

### 4.7.3.3. ch 3 - linear time series¶

note: can just assume all have 0 mean (otherwise add a constant)

• AR model $$AR(p)$$: $$$X_t = \sum_{i=1}^p \phi_i X_{t-i}+ \varepsilon_t$$$

• $$\phi_1, \ldots, \phi_p$$ are parameters

• $$\varepsilon_t$$ is white noise

• stationary assumption places constraints on param values (e.g. processes in the $$AR(1)$$ model with $$|\phi_1| \ge 1$$ are not stationary)

• looks just like linear regression, but is more complex

• if we don’t account for issues, things can go wrong

• model will not be stationary

• model may be misspecified

• $$E(\epsilon_t|X_{t-p}) \neq 0$$

• this represents a set of difference equations, and as such, must have a solution

• ex. $$AR(1)$$ model - if $$|\phi| < 0$$, then soln is in terms of past values of {$$\epsilon_t$$}, otherwise it is in terms of future values

• ex. simulating - if we know $$\phi$$ and $$\{\epsilon_t\}$$, we still need to use the backshift operator to solve for $$\{ X_t \}$$

• ex. $$AR(p)$$ model - if $$\sum_j |\phi_j|$$< 1, and $$\mathbb E |\epsilon_t| < \infty$$, then will have a causal stationary solution

• backshift operator $$B^kX_t=X_{t-k}$$

• solving requires using the backshift operator, because we need to solve for what all the residuals are

• characteristic polynomial $$\phi(a) = 1 - \sum_{j=1}^p \phi_j a^j$$

• $$\phi(B) X_t = \epsilon_t$$

• $$X_t=\phi(B)^{-1} \epsilon_t$$

• can represent $$AR(p)$$ as a vector $$AR(1)$$ using the vector $$\bar X_t = (X_t, ..., X_{t-p+1})$$

• note: can reparametrize in terms of frequencies

• MA model $$MA(q)$$: $$X_t = \sum_{i=1}^q \theta_i \varepsilon_{t-i} + \varepsilon_t$$

• $$\theta_1 ... \theta_q$$ are params

• $$\varepsilon_t$$, $$\varepsilon_{t-1}$$ are white noise error terms

• harder to fit, because the lagged error terms are not visible (also means can’t make preds on new time-series)

• $$E[\epsilon_t] = 0$$, $$Var[\epsilon_t] = 1$$

• much harder to estimate these parameters

• $$X_t = \theta (B) \epsilon_t$$ (assuming $$\theta_0=1$$)

• ARMA model: $$ARMA(p, q)$$: $$X_t = \sum_{i=1}^p \phi_i X_{t-i} + \sum_{i=1}^q \theta_i \varepsilon_{t-i} + \varepsilon_t$$

• $$\{X_t\}$$ is stationary

• $$\phi (B) X_t = \theta(B) \varepsilon_t$$

• $$\phi(B) = 1 - \sum_{j=1}^p \phi_j B^j$$

• $$\theta(B) = 1 + \sum_{j=1}^{q}\theta_jz^j$$

• causal if $$\exists \{ \psi_j \}$$ such that $$X_t = \sum_{j=0}^\infty \psi_j Z_{t-j}$$ for all t

• ARIMA model: $$ARIMA(p, d, q)$$: - generalizes ARMA model to non-stationarity (using differencing)

### 4.7.3.4. ch 4 + 8 - the autocovariance function + parameter estimation¶

• estimation

• pure autoregressive

• Yule-walker

• Burg estimation - minimizing sums of squares of forward and backward one-step prediction errors with respect to the coefficients

• when $$q > 0$$

• innovations algorithm

• hannan-rissanen algorithm

• autocovariance function: {$$\gamma(k): k \in \mathbb Z$$} where $$\gamma(k) = \text{Cov}(X_{t+h}. X_t) = \mathbb E (X_0 X_k)$$ (assuming mean 0)

• Yule-Walker equations (assuming AR(p) process): $$\mathbb E (X_t X_{t-k}) = \sum_{j=1}^p \phi_j \mathbb E (X_{t-j} X_{t-k}) + \underbrace{\mathbb E (\epsilon_tX_{t-k})}_{=0} = \sum_{j=1}^p \phi_j \mathbb E (X_{t-j} X_{t-k})$$

• ex. MA covariance becomes 0 with lag > num params

• can rewrite the Yule-Walker equations

• $$\gamma(i) = \sum_{j=1}^p \phi_j \gamma(i -j)$$

• $$\underline\gamma_p = \Gamma_p \underline \phi_p$$

• $$(\Gamma_p)_{i, j} = \gamma(i - j)$$

• $$\hat{\Gamma}_p$$ is nonegative definite (and nonsingular if there is at least one nonzero $$Y_i$$)

• $$\underline \gamma_p = [\gamma(1), ..., \gamma(p)]$$

• $$\underline \phi_p = (\phi_1, ..., \phi_p)$$

• this minimizes the mse $$\mathbb E [X_{t+1} - \sum_{j=1}^p \phi_j X_{t+1-j}]^2$$

• use estimates to solve: $$\hat{\underline \phi}_p = \hat \Sigma_p^{-1} \hat{\underline r}_p$$

• the innovations algorithm

• set $$\hat X_1 = 0$$

• innovations = one-step prediction errors $$U_n = X_n - \hat X _n$$

• mle (ch 5.2)

• eq. 5.2.9: Gaussian likelihood for an ARMA process

• $$r_n = \mathbb E[(W_{n+1} - \hat W_{n+1})^2]$$

## 4.7.4. multivariate time-series ch 7¶

• vector-valued time-series has dependencies between variables across time

• just modeling as univariate fails to take into account possible dependencies between the series