If you have studied probability theory, then you must have heard Markov’s name. When we study probability and statistics, we tend to deal with independent trials. What this means is that if you conduct an experiment a lot of times, we assume that the outcome of one trial doesn’t influence the outcome of the next trial. For example, let’s say you are tossing a coin. If you toss the coin 5 times, you are bound to get either heads or tails with equal probability. If the outcome of the first toss is heads, it doesn’t tell us anything about the next trial. But what if we are dealing with a situation where this assumption is not true? If we are dealing with something like estimating the weather, we cannot assume that today’s weather is not affected by what happened yesterday. If we go ahead with the independence assumption here, we are bound to get wrong results. How do we formulate this kind of model?
Why do we need it?
Let’s continue with our example in the previous paragraph. We have seen that when we conduct a sequence of probabilistic experiments, the possible outcomes for each experiment are the same, and they occur with the same probability. Also, our knowledge of the outcome of the previous experiment does not influence our prediction for the outcome of the next experiment. This is the basic premise of our experiments, where each experiment has a set of possible outcomes. In our case, it’s either heads or tails. The distribution for the outcomes of a single experiment is sufficient to construct a tree for a sequence of ‘n’ experiments. We can basically answer any probability question about these experiments by using this tree.
The independence assumption and the associated line of thinking is good in an ideal world. In the real world, things interact with each other and the outcome of one experiment influences the outcome of the next experiment. We cannot just assume they are independent and move on with our lives. It’s going to give us wrong results. Most of the modern probability theory deals with the processes for which the knowledge of previous outcomes influences predictions for future experiments. In principle, when we observe a sequence of such experiments, all of the past outcomes could influence our predictions for the next experiment. This is where the concept of “Markov Chains” comes into picture.
What exactly is it?
Before jumping into Markov Chains, let’s look at stochastic processes. A stochastic process is basically a mathematical model that evolves over time in a probabilistic way. This means that we are not bound to the same parameters as time goes by. The advantage of having such a model is that it allows us to adapt and react to the current situation. It’s like having to stick with 1950s technology to build bridges, even though we have have modern technology that’s much more efficient now! Markov Chain is a special type of stochastic process where the outcome of an experiment depends only on the outcome of the previous experiment. It was formulated by the A. A. Markov, an influential Russian mathematician who made significant contributions to probability theory.
When we talk about Markov Chains, we invariably talk in terms of “state” of a given system. Now what on earth is that? Well, the state of the system refers to the set of parameters that quantify the current condition of the system. In our earlier example about tossing a coin, “heads” is a state and “tails” is another state. So overall, the system can have two states. Heads and tails together form something called as the “state space” of our system. State space refers to the set of all states a system can be in. So in a Markov Chain, the current state of the system depends only on the previous state, and not on the preceding states.
How do we formulate it?
Now that we understand what Markov Chains are, we need to know how to formulate them. We will not be discussing any mathematical details. We will just see how we can specify the model so that we can use it to build a nice system by applying it to the real world. Let’s say we are conducting an experiment where it can have ‘n’ states. When you start the process, it starts in one of these states and keeps jumping from one state to another until you stop. Pretty straightforward! Let’s denote each state by Si, where i can take values between 1 and n. So when we are in Si, it can jump to another state Sj with a certain probability. This is called transition probability.
The interesting thing to note is that we can construct a nice matrix denoting these jumps. For each state, we can specify the probability of jumping to every other state (including itself). This results in a square matrix where each value specifies the probability of jumping from one state to another as given by the row and column number. This is called the transition matrix. Whenever we start the process, we usually know all the values of this matrix.
Real life example
Markov Chains are extensively used in computer vision, natural language processing, genetics, artificial intelligence, economics, finance, stock markets, etc. In fact, Markov Chains are so fundamental that they pretty much occur everywhere in fields pertaining to modern science and technology. To demonstrate the model, let’s look at a very simple example. Let’s say your job requires you to travel between three cities; San Francisco, Los Angeles and Boston. Ideally, you would like to minimize the traveling time. Keeping that in mind, let’s say the we have the following probabilities in place where P(X -> Y) indicates the probability of moving from city X to city Y:
P(SF -> SF) = 0.10 P(SF -> LA) = 0.70 P(SF -> Boston) = 0.20 P(LA -> LA) = 0.15 P(LA -> SF) = 0.75 P(LA -> Boston) = 0.10 P(Boston -> Boston) = 0.05 P(Boston -> SF) = 0.60 P(Boston -> LA) = 0.35
If you want to represent this with a transition matrix, it would look like this:
SF LA Boston SF 0.10 0.70 0.20 LA 0.75 0.15 0.10 Boston 0.60 0.35 0.05
Okay so we have a bunch of probabilities and we have a matrix. It’s not all that interesting, right? Well, that’s because we haven’t seen the problem yet. Now let’s say you start your journey on Monday from San Francisco and you have to plan something on Thursday. But that will depend on where you are. What’s the probability that you will be in Los Angeles on Thursday? This table will help us figure it out. If you don’t have a Markov Chain to model this, you will not know what your travel looks like and if you carefully compute it, we can say with a good amount of certainty that you will be city X on a particular day. If we denote the transition matrix by P and the current day by X, then Xi+1 = XiP. So in our case, Thursday is 3 days away from Monday. So by recursively substituting the values, we get Xi+3 = XiP3. If you do all the matrix calculations, then you will see that you will get the following probabilities for Thursday:
P(SF) = 0.31 P(LA) = 0.53 P(Boston) = 0.16
As we can see, there is a higher chance of being in LA than in any other city. If we know how these probabilities are for each day, then we can plan everything accordingly, as opposed to taking blind guesses.