# Deep Learning For Sequential Data – Part II: Constraints Of Traditional Approaches In the previous blog post, we discussed the nature of sequential data and why we need a robust separate modeling technique to analyze that data. Traditionally, people have been using Hidden Markov Models (HMMs) to analyze sequential data, so we will center the discussion around HMMs in this blog post. HMMs have been implemented for many tasks such as speech recognition, gesture recognition, part-of-speech tagging, and so on. But HMMs place a lot of restrictions as to how we can model our data. HMMs are definitely better than using classical machine learning techniques, but they don’t fully cover the needs of all the modern data analysis. This is because of the constraints that are used to build HMMs. What are those constraints?

Current state depends only on the previous state

This is actually in the very definition of HMMs. If you look at the equation for transition probabilities, you will see that the current state depends only on the previous state. It ignores everything that happened before the previous state, which is surely not true in the real world! In the previous blog post, we saw how such an assumption can lead to incorrect conclusions. Now purists will argue that this is technically a first order HMM and that we can design an HMM to go back further. But even if we do that, the next state will depend on a fixed of previous instances. This is again very restrictive because we don’t know how far back we should be going. We also don’t know if this is a fixed number for the entire dataset as we move along the sequence during training.

HMMs are generative

This is not exactly a constraint, but HMMs are generative models by nature. HMMs model the joint distribution of outputs and hidden states. This means that we assume the prior distribution over the transition probabilities is uniform. We also have a discriminative model for HMMs where instead of modeling the joint distribution, we directly model the conditional distribution of the hidden states over the outputs observations. Maximum Entropy Markov model is a good example of this approach. In the real world, we need a strong combination of generative and discriminative models to perform many tasks.

Transition probabilities are constant

HMMs assume that the state transition probabilities are not dependent on the time. What this means is that it doesn’t allow the transition matrix to evolve with time. For example, let’s say we start with the assumption that the probability that you will go camping tomorrow given that today is Friday is 0.82. Now, a few years down the line, your priorities will change and you might want to do something else on a Saturday. HMMs won’t allow your state transition matrix to evolve over time, which is a big restriction because the data tends to change with time. Our model should be able to learn how to adapt to new patterns.

Outputs are independent of previous outputs

HMMs assume that the current output is independent of the previous outputs. As you can see, this is a big restriction because it doesn’t hold true in the real world. The current output impacts the future outputs of a system in most cases. Let’s say we are dealing with an unknown source of data. All we can do is observe the data that comes out of it. How can we assume that the current output is not being influenced by the previous outputs? This is actually one of the biggest weaknesses of HMMs.

——————————————————————————————————————