Ergodicity is one of the most important concepts in statistics. More importantly, it has a lot of real world applications. In this case, it’s applicable to the staggering number of internet connected devices in the world of Internet of Things (IoT). Most of the experiments conducted by research labs, businesses, and marketing agencies often rely on statistics to compile the results. This can be about a set of customers, voters, viewers, or any other segment. Ever wondered why the results are often inaccurate? One of the main reasons is the underlying assumption about ergodicity. What exactly is it? Continue reading

# Tag Archives: Statistics

# Measuring The Memory Of Time Series Data

Time series data has memory. It remembers what happened in the past and avenge any wrongdoings! Can you believe it? Okay the avenging part may not be true, but it definitely remembers the past. The “memory” refers to how strongly the past can influence the future in a given time series variable. If it has a strong memory, then we know that analyzing the past would be really useful to us because it can tell us what’s going to happen in the future. If you need a quick refresher, you can check out my blog post where I talked about memory in time series data. We have a high level understanding of how we can classify time series data into short memory and long memory, but how do we actually measure the memory? Continue reading

# What Is Long Memory In Time Series Analysis

We encounter time series data very frequently in the real world. Some common examples include real time sensors, surveillance video, stock market, astrophysics, speech recognition, and so on. In order to study time series data, we try to extract various characteristics that tend to define it. One of the most important things to think about is the dependence between various points in the time series data. Is there any dependence between the values in the time series data? If so, how far apart in time do they have to be in order to affect each other? Understanding these aspects will open up new doors in terms of how we analyze the data. This is where the concept of long memory comes into picture. Let’s dig a little deeper and understand it, shall we? Continue reading

# Estimating The Predictability Of Time Series Data – Part II

In the previous blog post, we discussed various types of time series data. We understood the concepts of stationarity and shocks. In this blog post, we will continue to discuss how we can estimate the predictability of time series data. People say that future is unpredictable. But that’s grossly reductive! What they actually mean to say is — I’m blindly assuming that my time series data is non-stationary, so I cannot accurately predict what’s going to happen in the future. Predicting future values can open a lot of doors in the Internet of Things (IoT) ecosystem. Before we can forecast future values, it’s important to determine if the time series data exhibits any properties that can be modeled. If not, we are just dealing with chaos and no model will be good enough. But a lot of data in the real world exhibits patterns, so we just need to look at it the right way. Let’s see how we can check if the given time series data has any underlying trends, shall we? Continue reading

# Estimating The Predictability Of Time Series Data – Part I

Time series data refers to a sequence of measurements made over time. The frequency of these measurements are usually fixed, say once every second or once every hour. We encounter time series data in a variety of scenarios in the real world. Some examples include stock market data, sensor data, speech data, and so on. People like to build forecasting models for time series data. This is very relevant in modeling data in the world of Internet of Things (IoT). Based on the past data, they want to predict what’s going to happen in the future. Once of the most important questions is to see whether or not we can predict something in the first place. How do we determine that? How do we check if there are underlying patterns in the time series data? Continue reading

# What Is Bayesian Information Criterion?

Let’s say you have a bunch of datapoints and you want to come up with a nice model for them. We want this model to satisfy all the points in the best possible way. If we do this, then we will be able to use a mathematical formula to extract information about unknown points. At the same time, we should make sure that we don’t overfit our model to these datapoints. If we overfit our model, then it will tune itself too much to our datapoints and perform poorly on unknown data. So how we pick the best model? Where do we draw the line? Continue reading

# What’s The Importance Of Hyperparameters In Machine Learning?

Machine learning is becoming increasingly relevant in all walks of science and technology. In fact, it’s an integral part of many fields like computer vision, natural language processing, robotics, e-commerce, spam filtering, and so on. The list is potential applications is pretty huge! People working on machine learning tend to build models based on training data, in the hope that those models will perform well on unseen data. As we all know, every model has some parameters associated with it. We want our machine learning models to estimate these parameters from the training data. But as it turns out, there are a few parameters that cannot be estimated using this procedure. These parameters tend have a significant impact on the performance of your model. Now why is that? Where do these parameters come from? How do we deal with this? Continue reading