Ergodicity is one of the most important concepts in statistics. More importantly, it has a lot of real world applications. In this case, it’s applicable to the staggering number of internet connected devices in the world of Internet of Things (IoT). Most of the experiments conducted by research labs, businesses, and marketing agencies often rely on statistics to compile the results. This can be about a set of customers, voters, viewers, or any other segment. Ever wondered why the results are often inaccurate? One of the main reasons is the underlying assumption about ergodicity. What exactly is it? Continue reading

# Tag Archives: Statistics

# Measuring The Memory Of Time Series Data

Time series data has memory. It remembers what happened in the past and avenge any wrongdoings! Can you believe it? Okay the avenging part may not be true, but it definitely remembers the past. The “memory” refers to how strongly the past can influence the future in a given time series variable. If it has a strong memory, then we know that analyzing the past would be really useful to us because it can tell us what’s going to happen in the future. If you need a quick refresher, you can check out my blog post where I talked about memory in time series data. We have a high level understanding of how we can classify time series data into short memory and long memory, but how do we actually measure the memory? Continue reading

# What Is Long Memory In Time Series Analysis

We encounter time series data very frequently in the real world. Some common examples include real time sensors, surveillance video, stock market, astrophysics, speech recognition, and so on. In order to study time series data, we try to extract various characteristics that tend to define it. One of the most important things to think about is the dependence between various points in the time series data. Is there any dependence between the values in the time series data? If so, how far apart in time do they have to be in order to affect each other? Understanding these aspects will open up new doors in terms of how we analyze the data. This is where the concept of long memory comes into picture. Let’s dig a little deeper and understand it, shall we? Continue reading

# Estimating The Predictability Of Time Series Data – Part II

In the previous blog post, we discussed various types of time series data. We understood the concepts of stationarity and shocks. In this blog post, we will continue to discuss how we can estimate the predictability of time series data. People say that future is unpredictable. But that’s grossly reductive! What they actually mean to say is — I’m blindly assuming that my time series data is non-stationary, so I cannot accurately predict what’s going to happen in the future. Predicting future values can open a lot of doors in the Internet of Things (IoT) ecosystem. Before we can forecast future values, it’s important to determine if the time series data exhibits any properties that can be modeled. If not, we are just dealing with chaos and no model will be good enough. But a lot of data in the real world exhibits patterns, so we just need to look at it the right way. Let’s see how we can check if the given time series data has any underlying trends, shall we? Continue reading

# Estimating The Predictability Of Time Series Data – Part I

Time series data refers to a sequence of measurements made over time. The frequency of these measurements are usually fixed, say once every second or once every hour. We encounter time series data in a variety of scenarios in the real world. Some examples include stock market data, sensor data, speech data, and so on. People like to build forecasting models for time series data. This is very relevant in modeling data in the world of Internet of Things (IoT). Based on the past data, they want to predict what’s going to happen in the future. Once of the most important questions is to see whether or not we can predict something in the first place. How do we determine that? How do we check if there are underlying patterns in the time series data? Continue reading

# What Is Bayesian Information Criterion?

Let’s say you have a bunch of datapoints and you want to come up with a nice model for them. We want this model to satisfy all the points in the best possible way. If we do this, then we will be able to use a mathematical formula to extract information about unknown points. At the same time, we should make sure that we don’t overfit our model to these datapoints. If we overfit our model, then it will tune itself too much to our datapoints and perform poorly on unknown data. So how we pick the best model? Where do we draw the line? Continue reading

# What’s The Importance Of Hyperparameters In Machine Learning?

Machine learning is becoming increasingly relevant in all walks of science and technology. In fact, it’s an integral part of many fields like computer vision, natural language processing, robotics, e-commerce, spam filtering, and so on. The list is potential applications is pretty huge! People working on machine learning tend to build models based on training data, in the hope that those models will perform well on unseen data. As we all know, every model has some parameters associated with it. We want our machine learning models to estimate these parameters from the training data. But as it turns out, there are a few parameters that cannot be estimated using this procedure. These parameters tend have a significant impact on the performance of your model. Now why is that? Where do these parameters come from? How do we deal with this? Continue reading

# What Is A Markov Chain?

If you have studied probability theory, then you must have heard Markov’s name. When we study probability and statistics, we tend to deal with independent trials. What this means is that if you conduct an experiment a lot of times, we assume that the outcome of one trial doesn’t influence the outcome of the next trial. For example, let’s say you are tossing a coin. If you toss the coin 5 times, you are bound to get either heads or tails with equal probability. If the outcome of the first toss is heads, it doesn’t tell us anything about the next trial. But what if we are dealing with a situation where this assumption is not true? If we are dealing with something like estimating the weather, we cannot assume that today’s weather is not affected by what happened yesterday. If we go ahead with the independence assumption here, we are bound to get wrong results. How do we formulate this kind of model? Continue reading

# What Is Maximum Likelihood Estimation?

Let’s say you are trying to estimate the height of a group of people somewhere. If the group is small enough, you can just measure all of them and be done with it. But in real life, the groups are pretty large and you cannot measure each and every person. So we end up having a model which will estimate the height of a person. For example, if you are surveying a group of professional basketball players, you may have a model which will be centered around 6’7″ with a variance of a couple of inches. But how do we get this model in the first place? How do we know if this model is accurate enough to fit the entire group? Continue reading

# What Are Confidence Intervals?

Confidence interval is a concept in statistics that is used extensively in many diverse areas like physics, chemistry, computer vision, machine learning, genetics, etc. This concept is so fundamental that any modern science would eventually end up using it. Let’s say you have collected some data and you want to understand the behavior of that data. For example, you can say that the data is centered around some value or that the data is distributed with a certain amount of variance. This is very common in many fields where you have estimate the underlying parameters that govern the data distribution. When you estimate a statistical parameter from some data, you can’t be certain about its true value. If you have a lot of high-quality data, then you’re more confident that your estimate is near its true value. But if you don’t have a lot of data, or if it’s of poor quality, then you don’t have much confidence in it. So how do we deal with these situations? Can we measure this uncertainty? Continue reading