Time series data refers to a sequence of measurements made over time. The frequency of these measurements are usually fixed, say once every second or once every hour. We encounter time series data in a variety of scenarios in the real world. Some examples include stock market data, sensor data, speech data, and so on. People like to build forecasting models for time series data. This is very relevant in modeling data in the world of Internet of Things (IoT). Based on the past data, they want to predict what’s going to happen in the future. Once of the most important questions is to see whether or not we can predict something in the first place. How do we determine that? How do we check if there are underlying patterns in the time series data?
Types of time series data
Let us start by discussing a sample problem. Consider that we are collecting data ‘y’ as a time series. Here, ‘y’ can be a temperature measurements or water flow measurement taken once very hour. When we look at time series data, we always look for patterns. In the real world, the data is assumed to be generated from stochastic processes and we try to model those processes. It’s important to understand different types of time series so that we can model them correctly. Here are the most common ones:
Stationary: A stationary process is nothing but a stochastic process whose properties don’t change when shifted in time. What properties are we talking about? It’s mostly the joint probability distribution. What does that mean? It means that the distribution is characterized by some parameters and those parameters don’t change over time. Hence the parameters like mean and variance do not change over time and do not follow any trends. For example, a sine wave is a good example of stationary process.
Brownian: Brownian time series data refers to the type of series where there’s is no correlation between the current measurement and the future measurements. This is also known as a random walk. There is an equal chance of future measurements being higher or lower than the current measurement. As you can imagine, it’s difficult to make any kind of predictions on Brownian time series.
Cyclostationary: A cyclostationary process refers to a signal whose statistical properties vary cyclically with time. For example, the temperature variations in a particular city can be modeled as an annual event. The temperature variations on Aug 21 can be different from Nov 15. But the temperature variations on Aug 21 this year are similar to the temperature variations on Aug 21 last year. So we can model the temperature variations as a cyclostationary process. A cyclostationary process is actually made up of multiple stationary processes that are interleaved in some way.
Trend stationary: We observe a lot of trend stationary processes in the real world. If a signal is hovering around a trend, it is called a trend stationary process. What exactly is a “trend”? It can be anything that varies smoothly over time. For example, trend can be something like the mean is varying smoothly over time (growing or decaying). Even if there are outliers in the data, it always comes back to the mean value. Hence it is also called mean-reverting process. Technically, trend-stationary processes are not stationary. As we discussed earlier, the mean of stationary processes don’t change over time. But we can easily convert trend stationary processes into stationary processes by removing the mean. By removing this underlying trend, the process becomes stationary.
What are shocks?
One of the most defining characteristics of time series data is how it reacts to shocks. What exactly are shocks? Let’s say we have a model that’s linear. It means that it linearly transforms explanatory variables into observations ‘y’. This model is called the null model. Now we want to check if there are features in the data that are not fully explained by the null model. If there are sudden or unexpected movements in these features, it will cause a disruption. So we can model them using the following equation:
y = f(x) + s
Here, f(x) is the null model and ‘s’ represents shock effects. So basically, a shock is an unexpected event that takes place at a particular point of time in the series. Its defining qualities are location and magnitude. Time series data is heavily influenced by such events. The effect of such a shock is not exactly confined to that point in time. It is important to understand the effect of the shock impact and also the subsequent effects. The subsequent effects are determined by the null model. It determines how the shock is propagated throughout the future values of time series data. The study of shocks is crucial because it determines the predictability of time series data. Stationary processes react very differently to shocks as compared to non-stationary processes.
2 thoughts on “Estimating The Predictability Of Time Series Data – Part I”