Previous chapter → Chapter 1 - Introduction To Time Series Analysis

In time series analysis, estimators play a crucial role in understanding the underlying patterns and behaviors of data that evolve over time. A key goal in time series analysis is to perform inference on unknown parameters (e.g. means, variances, autocorrelations and trends) based on the observed data. These parameters can provide an insight into the structure of the time series, enabling the capability to model it, understand its dynamics and make forecasts. Since the true values of these parameters is often unknown,

**estimators**for these quantities are used, which provide approximations based on the available samples.Without estimators, it would be impossible to quantify or predict time-dependent phenomena. For instance, in financial markets, estimating volatility is essential for risk management, while in climate studies, estimating trends and seasonal patterns helps to understand long-term changes. The accuracy and reliability of these estimates directly affect the quality of any conclusions that can be drawn, albeit forecasted future values or tests about various hypotheses regarding the behavior of a time series. Therefor, selecting the right estimator and understanding its properties is essential for time series analysis.

### 2.1 Basic Definitions

An

**estimator**is a rule or mathematical formula that provides an approximation of an unknown parameter based on observed data. In the context of time series analysis, an estimator helps us infer characteristics of the underlying process generating the data, such as the mean, variance, or correlation, which are typically unknown. They allow to make educated guesses about these parameters using the information contained in a finite sample of observations. This is helpful, since generally the entire population or process cannot be observed over infinite time.**Estimator Vs. Estimate**

*An*

**Def. 2.2- Estimator****estimator**(often denoted ) is the method or function applied to data, whereas an

**estimate**is the value obtained by applying this method or function to a specific dataset.

For instance, to estimate the mean of a time series, the simple average estimator can be used. In this case, the estimator is the specific formula that is used to obtain the sample mean, while the estimate would be the numerical result obtained by applying this formula to a particular sample of data points.

**Parameters and Population**

*A*

**Def. 2.3 - Parameter****parameter**is a numerical characteristic that describes a feature of the process generating the time series.

Parameters can represent long-term properties like the average value (mean), volatility (variance), or the relationship between time points (autocorrelation). Since the entire

*“population”*is rarely fully accessible or the full realization of the time series (which would span an infinite or unobservable time period) is unknown, an estimator can be used to approximate these parameters based on a sample of finite length .### 2.2 Properties of Estimators

Understanding the properties of estimators is essential for evaluating their effectiveness in approximating unknown parameters. An estimator’s performance is not solely based on the value it produces but also on its theoretical qualities, which determine how well it can be expected to behave in different scenarios. Key properties such as unbiasedness, consistency and efficiency help to judge whether an estimator is reliable, whether it converges to the true parameter as more data becomes available and how much uncertainty is associated with its estimates. These properties form the foundation for selecting the most appropriate estimator for time series models.

**Def 2.4 - Properties of Estimators****(Un)Biased -**An estimator is**unbiased**if , meaning that the expected value of the estimator is equal to the true parameter.**biased**if .

**Consistency -**An estimator is**consistent**if the estimator converges to the true value with increasing sample size.

**Efficiency -**An efficient estimator has the smallest possible variance among all unbiased estimators.

It must be noted that when the sample size increases, the estimates produced by an estimator are (often) distributed normally with the mean and variance of the estimator. This is the result of both the

**law of large numbers**and the**central limit theorem**(CLT).If this is the case, the estimator is

**asymptotically normal**. In almost all cases, an estimator does not produce truly accurate estimates (i.e. often ). It is therefor interesting and also necessary to quantify the error of an estimate produced by the estimator.

**Def 2.5 - Standard Error of**The

**standard error**(SE) of an estimator is a measure of the precision of the estimator. It is mathematically defined as A

**confidence interval**(CI) is a range of values, derived from data using the estimator , that is likely to contain the true parameter with a certain level of confidence. This can be useful since it provides more information than a single point estimate. It allows to quantify uncertainty about the parameter that is being estimated and can help make informed decisions, showing how precise a certain estimate is and how confident you can be in it.

**Def 2.6 - Confidence Interval of**If is a consistent, asymptotically normal estimator, the 95% CI for is given by

This result is obtained by first computing the critical value (

**z-score**) of the normal distribution for the**significance level**. Remember, this value is the result from computing the amount of standard deviations away from the mean it is necessary to go to capture 95% of the data starting in the center from the distribution, going towards the tails in a symmetric fashion. In other words, for a distributionThe standard error can also be useful for statistical testing. In case of testing a hypothesis , the hypothesis can be rejected (, for

*“Reject**”*) at a significance level if is not in the confidence interval.For the remainder of the syllabus, will be rejected (at if

**Review - P-value**The

**P-value**is the probability that the test statistic takes values more extreme than the computed one (under .The default choice for the significance level is . This level gives the type I error, i.e. the probability of rejecting when it holds.

- If , then the deviation from is said to be significant.

- The P-value should be considered on a continuous scale.

- The smaller the P-value, the more evidence in the data against the null-hypothesis.

### 2.3 Estimator Types ❌

In time series analysis, several types of estimators can be used to infer unknown parameters from data, as discussed previously. These estimators can be categorized based on their approach and the type of information they provide. This section briefly explores some of the most common estimators and their applications. Understanding estimators and their properties is often key in selecting the most appropriate one for a given model.

**Point Estimator**

A

**point estimator**is a rule or formula that provides a single value as an estimate for an unknown population parameter. It gives the best guess or approximation based on the available sample data. Common examples of point estimators can include:- The
*sample mean*estimator is a point estimator for the sample mean.

- The
*sample variance*estimator is a point estimator for the sample variance.

The advantage of point estimators is that they are straightforward and easy to compute. However, they provide no information about the uncertainty associated with the estimate. For instance, while the sample mean offers a single value as an estimate of the true mean, it does not convey how confident we can be about that estimate.

**Interval Estimators**

In the case where we want to have a quantifiable uncertainty in the estimate of a parameter an

**interval estimator**can be used. Unlike point estimators, interval estimators provide a sense of precision of the estimate and account for sampling variability. This makes them essential to quantify the uncertainty surrounding a parameter. The result of an interval estimator is a range of values within which the true parameter is likely to fall. The most common interval estimators are**confidence intervals**, which have been described in the previous section. The general expression to find a confidence interval with a significance level is given by:In most cases, the interval is centered around the sample mean, such that .

**Maximum Likelihood Estimator (MLE)**

The

**maximum likelihood estimator**(abbr. MLE) is one of the most widely used estimation methods, especially in time series models such as ARIMA and GARCH. The MLE seeks to find the parameter values that maximize the likelihood function, which represents the probability of the observed data given a set of parameter values. Mathematically speaking, for a given model with parameters and observed data , the expression for the MLE is given by:Here, is the

**likelihood function**. The MLE has many desirable properties, such as asymptotic efficiency (i.e. as the sample size grows larger, the estimator achieves the lowest possible variance) and consistency. Often MLSE is used in tie series analysis to estimate the parameters of stochastic models such as AR, MA and ARMA processes.**Method Of Moments (MoM)**

The

**method of moments**estimator is an alternative to the MLE. It involves equating stochastic sample moments (i.e. sample mean, variance, etc) to theoretical moments of the population distribution, to estimate the parameters. For example, if is the population mean and is the sample mean, the method of moments would estimate by setting . More general, for a distribution with parameters , the MoM estimator solves:Here, are the moment equations derived from the distribution’s properties. While MoM is simpler and more intuitive in some cases, one downside is that it tends to be less efficient than the MLE.

**Least Squares Estimator**

The

**least squares**estimator is primarily used in regression context, including time series models such as AR, ARMA and ARIMA models. These model the relationship between a dependent variable and its lagged values (commonly known as the predictors). The least squares estimator minimizes the sum of squared differences between observed and predicted values:What each of the variables in this equation means will become more clear in the next chapter, which discusses regression models in more detail.

**Bayesian Estimation**

It is impossible to explain estimators without at least mentioning

**Bayesian estimation**. Bayesian estimation incorporates prior knowledge about the parameter being estimated, combining it with the observed data to form a**posterior distribution**. In Bayesian estimation, parameters are treated as random variables with prior distributions (i.e. the parameters are taken from a distribution). The estimation process involves updating the prior distribution based on the likelihood of the observed sample data, resulting in a posterior distribution, which reflects the updates beliefs about the parameter being estimated. This requires the introduction of**Bayes’ Theorem**: Which reads as follows: the probability of observing the parameters , given the sample data is given by the product of the probability of observing the sample data given the parameter times the probability of the parameters being chosen (which is obtained from a prior distribution), divided by the probability of observing the sample data.

One estimator derived with Bayesian estimation is the

**maximum a posteriori**(MAP) estimator. The MAP estimator is a point estimator in which the value of maximizes the posterior distribution. It can be seen as the Bayesian analog of MLE. The MAP estimator can be mathematically expressed as:### 2.4 Common Estimators in TSA ❌

In the previous sections and chapters some common estimators for time series analysis in particular have already been explained. The sample mean and variance estimators can help in estimating mean and variance, while estimators such as the autocorrelation function (abbrev. ACF) can be used to estimate the autocorrelation at a lag order . In this brief section we explore some other common estimators that are used in time series analysis such as the partial ACF, periodogram estimator, the Hurst exponent and (co)variance matrices.

**Partial Autocorrelation Function (PACF)**

In time series analysis, the

**partial autocorrelation function**(PACF) gives the partial autocorrelation of a stationary time series with its own lagged values, regressed with values of the time series at all shorter lags. This is different from the regular autocorrelation function, which does not control for other lags. The PACF plays an important role in identifying the extent of the lag in an autoregressive (AR) model. It can thus be used to determine the appropriate lags in an model (or in extent, even an model.Given a time series , the partial autocorrelation of lag , denoted is the autocorrelation between and with the linear dependence of on through removed. Equivalently, it is the autocorrelation between and that is not accounted for by lags 1 through , inclusive.

Where and are linear combinations of that minimize the mean squared error of and respectively. Over the course of this book, the PACF will be repeatedly be mentioned as a part of discussing various time series modellings.

**Periodogram Estimator**

The

**periodogram**estimator is used in spectral analysis to represent the frequency domain characteristics of a time series. It shows how the variance (or**power**) of the series is distributed over different frequency components. This is particularly useful for identifying cycles or periodic behavior in the data. The periodogram is given by the following expression.Here, is the frequency and are the time series observations. The periodogram provides an estimate of the power spectral density (PSD) of the time series at different frequencies. Peaks in the periodogram indicate prominent cycles or periodicities in the data. It is an estimator that is widely used in fields such as signal processing, economics and climatology. In these fields, it is crucial for understanding the underlying processes.

**Hurst Exponent**

In general, the

**Hurst exponent**is a statistical method for estimating parameters of a time series without making assumptions about stationarity. It measures the long-term memory or persistence of a time series. This allows to determine whether a series exhibits a**trend-reinforcing**(persistent) or**mean-reverting**(anti-persistent) behavior, or whether it is simply a random walk.The Hurst exponent is estimated based on the relationship between the rescaled range of a time series and the time interval over which the range is measured. It can be estimated using the following expression:

where is the range of the first cumulative deviations from the mean. is the series (sum) of the first standard deviations. The number of observations that are taken into account is and is a constant. Typically, the value of the Hurst exponent falls between 0 and 1:

- suggests
**anti-persistent**behavior. This means that movements in the values of the time series data are switching directions. Values closer to 0 indicate a time series that has a tendency to switch directions. If the time series has been trending upwards, it is likely to go downward soon and vice versa. This behavior is typical of a**mean-reverting process**(see example 1.3).

- suggests a random walk, commonly known as Brownian motion. This means there is no correlation between the movements in time series. Each step or value in the series is independent of the previous ones, with no predictable trend or mean-reverting tendencies.

- suggests
**persistent**behavior. This means that the movements in the values of the time series are constant. Values closer to 1 indicates a strong memory or trend-following nature, meaning that if the series is in an upward trend, it is likely to continue upward (and vice versa). This persistence is often observed in trending data.

**Variance & Covariance Matrices**

### 2.5 Advanced Topics ❌

**Bias-Variance Tradeoff**

**Regularization & Shrinkage Estimators**

**Bootstrap Methods**

**Goodness-of-Fit Tests**

**Model Misspecification**

**Error Metrics**

### 2.6 Special Cases in Time Series ❌

**Estimation in Non-Stationary Time Series**

**Multivariate Time Series Estimators**

**Non-parametric Estimation**

Continue reading → Chapter 3 - Regression Models

**TODO**

- Add examples of PACF for AR, MA and ARMA models.