Stationarity in time series data – What, Why and How

Before stationarity, let’s understand white noise.

What is white noise?

A white noise series is a sequence of random numbers and cannot be predicted.

More formally, a series is white noise if the variables are independent and identically distributed with mean of zero and same variance.

Each value has zero correlation with other values in the series.

If the values are drawn from a gaussian distribution, then it is called Gaussian white noise.

Let’s generate a white noise series:
generating a normal distribution
Plotting white noise series :

A white noise series doesn’t show a pattern.

Let’s take another data set and see if it is white noise.

Checking if the data is White Noise :

Things to check

  1. If the mean is zero
  2. If autocorrelation is zero.
  3. A simple plot will give us a clue even before we do the above two tests
Electric_Production.csv

Now, we understand that its not a white noise series. However, let’s look at the autocorrelation in the series.

After we model and forecast for the data, if the residuals are not white noise, then we can understand that the model did not capture the entire pattern. 
Hence, White noise is an important concept to understand in time series analysis. 

What is stationarity?

Whenever or at any point in time, if we look at the data the data looks similar. Look at the white noise plot above, at any point in time, the plot looks similar.

Mostly, a stationary series looks like it is moving horizontally.

A stationary time series is one whose properties do not depend on the time at which the series is observed.

otexts.com

Means that, a process’s statistical properties that create a time series are constant over time. This statistical consistency makes distributions predictable enabling forecasting, and is an assumption of many time series forecasting models.

Analyzing Alpha

Data with trend and seasonality is not stationary as those will affect the value of the series at different times.

A white noise series is stationary.

Look at the white noise plot above.

So a stationary time series data will have no predictable patterns in the long-term.

Simply by looking at the time plot of the data, we should be able to guess if a series is stationary.

Question:

Which of the plots below are stationary?

https://otexts.com/fpp2/stationarity.html
  1. b looks stationary as the fluctuations seem very random
  2. i looks non-stationary as the trend is pretty obvious.
  3. a has a changing level and a trend is present – Non – stationary
  4. c, e, f are similar to a – Non – Stationary
  5. g looks stationary as there seems to be a cyclic pattern. But, these cycles are non-periodic. The cycles seem to be dependent on some other factor other than time. So, it makes it difficult to forecast the position of the line at next point in time.

Only b, g are stationary

Some cases can be confusing — a time series with cyclic behavior (but with no trend or seasonality) is stationary. This is because the cycles are not of a fixed length, so before we observe the series we cannot be sure where the peaks and troughs of the cycles will be.

otexts.com

Testing data for stationarity:

In the above block, we have looked at the time plots of the data and guessed if the data is stationary.

Another way to understand the stationarity of the data is to look at the ACF plot of the time series.

  1. For a stationary data, ACF plot will drop to zero relatively quickly .
  2. For a non-stationary data ACF will decrease slowly. (Why? Because, there will be high autocorrelation with lag values).
Non-Stationary data

Why is stationarity important?

Stationarity has specific statistical properties which can be modelled in order to forecast a likely outcome.

Any stationary data can be broken down into two parts :

  1. Signal – The time series data we can potentially predict 
  2. White Noise – The part of the data set that’s unpredictable

If we can prove that the errors in our model is white noise, we have built a great model!

Stationarity is a common assumption for many time series models.

Methods to convert non-stationary series to a stationary series:

Differencing:

Computing the difference between consecutive observations is called differencing.

Differencing helps stabilize the mean of a time series by removing changes in level of a time series, and therefore eliminating or reducing trend and seasonality.

Our data
Differencing using inbuilt function in pandas
data with differenced column

The differenced data is obtained from subtracting the previous value from the current value. We are calculating the increase in electric production over time.

The intuitive understanding of differencing is, if the blue plot is about the amount of electricity produced yearly, the orange plot is about the change in electricity production overs the years. The change seems to have a level.

ACF Plot of the differenced data:

ACF plot of differenced data

Compare it with ACF plot of the actual data :

ACF plot of actual data

Than the ACF plot of actual data, there seems to be a sharp decrease in AC.

Seasonality is still there.

The seasonal pattern is: at every third day, the correlation is high.

To reduce the trend in the data, sometimes we need to do second order differencing. Practically, we never go beyond second order differencing.

The seasonality component is still there.

Seasonal Differencing:

Seasonal difference is the difference between an observation and the previous observation for the same season. These are also called “lag-m” differences as we subtract the observation after a lag of m periods.

Seasonally differenced data with m = 3
Stationary data plot

This plot looks more like stationary series, isn’t it?

Lets look at the autocorrelation plot

seasonally differenced data

Remember the property of autocorrelation steeply decreasing for a stationary series?

What do you see in the ACF plot above?

Note: There are statistical tests to check if differencing is required. A number of Unit Root tests are available. In this test, the null hypothesis is that the data is stationary. Our objective is to find out if that is false. Small p-values suggest that differencing is required. 

Leave a comment

Website Built with WordPress.com.

Up ↑

Design a site like this with WordPress.com
Get started