In the previous blog post, we discussed various types of time series data. We understood the concepts of stationarity and shocks. In this blog post, we will continue to discuss how we can estimate the predictability of time series data. People say that future is unpredictable. But that’s grossly reductive! What they actually mean to say is — I’m blindly assuming that my time series data is non-stationary, so I cannot accurately predict what’s going to happen in the future. Predicting future values can open a lot of doors in the Internet of Things (IoT) ecosystem. Before we can forecast future values, it’s important to determine if the time series data exhibits any properties that can be modeled. If not, we are just dealing with chaos and no model will be good enough. But a lot of data in the real world exhibits patterns, so we just need to look at it the right way. Let’s see how we can check if the given time series data has any underlying trends, shall we?
What are autoregressive models?
Autoregressive models are used extensively for forecasting. An autoregressive model is a model where the current output depends linearly on its own previous values and a probabilistic variable. It’s important to note that the relationship with the previous values is linear. The current value in the time series is obtained by regressing on previous values from the same series. The variables that are used to predict the current output are called predictors.
How many previous values should we consider? That depends the problem at hand. If you take more values, the model becomes increasingly complex. The number of preceding values that are used to predict is called the order of autoregression. If you consider just one previous value, it becomes a first-order autoregression written as AR(1). If we generalize it, a kth-order autoregression is a linear regression model where the current output of the series at any given time t is a linear function of the values of the series at times t−1, t−2,…., t−k.
Can we quantify the predictability factor?
If the process is stationary, then we can do a lot with it. If the process is non-stationary, we cannot do much about it. But if we don’t know if it’s stationary or not, then it’s a big problem because it becomes a wild goose chase. This is where the unit root becomes really relevant.
A unit root is an important feature of time series processes that can impact inferences and outputs. Every time series process has something called a characteristic equation. A given process is differentiated ‘n’ times to construct this equation. The reason this equation is important is because the roots of this equation are indicative of stationarity.
If a process has a unit root i.e. if the root of the characteristic equation is 1, then such a process is non-stationary. It is not trend stationary either! A characteristic equation can have multiple roots. So if the other roots of the characteristic equation are inside the unit circle, then there is a way to convert the process into a stationary process. By taking the first difference of the process, we can convert it into a stationary process. This stationary process can be modeled nicely.
Unit root process vs trend stationary process
From the previous section, we can see that the unit root processes are not stationary. We learnt in the previous blog post that trend stationary processes are not stationary either. So does that mean unit root processes and trend stationary processes are similar? Absolutely not!
Even though they share many properties, they are different in many ways. It is possible for time series data to have no unit root, be non-stationary, and yet be trend-stationary. The difference is in the way these processes react to shocks. Trend-stationary processes are mean-reverting, which means the time series will converge back towards the mean value. The mean is not affected by the shock. In unit-root processes, shocks have a permanent impact on the mean, which means there is no pattern over time.
How to check for unit roots?
We use unit root tests to check for the presence of unit roots. Checking for unit roots gives a lot of information about the time series data. A unit root test tests whether a time series variable is non-stationary by checking if there is a unit root in the characteristic equation. The null hypothesis is defined as the presence of a unit root. This means that we are going in with the assumption that the unit root is present. So the alternative hypothesis is either stationarity, trend stationarity, or explosive root (root greater than 1).
One of the most popular tests is called Dickey–Fuller test. There is another version of the test called augmented Dickey-Fuller test that’s used more frequently. The output of this test is a negative number. If the value is more negative, then it indicates that the unit root is not present. It basically rejects the hypothesis that there is a unit root at some level of confidence.