This is a continuation of my previous blog post. In that post, we discussed about why we need conditional random fields in the first place. We have graphical models in machine learning that are widely used to solve many different problems. But Conditional Random Fields (CRFs) address a critical problem faced by these graphical models. A popular example for graphical models is Hidden Markov Models (HMMs). HMMs have gained a lot of popularity in recent years due to their robustness and accuracy. They are used in computer vision, speech recognition and other time-series related data analysis. CRFs outperform HMMs in many different tasks. How is that? What are these CRFs and how are they formulated?

**How does CRF solve the problem faced by graphical models?**

A solution to this problem is to model the conditional distribution directly, which is all that is needed for classification. CRFs are essentially a way of combining the advantages of classification and graphical modeling, combining the ability to compactly model multivariate data with the ability to leverage a large number of input features for prediction. The advantage to a conditional model is that dependencies that involve only variables from input data play no role in the conditional model, so that an accurate conditional model can have much simpler structure than a joint model. For machine learning geeks out there, the difference between generative models and CRFs is analogous to the difference between the naive Bayes and logistic regression classifiers. In fact, the multinomial logistic regression model can be seen as the simplest kind of CRF, in which there is only one output variable.

**What are CRFs?**

Conditional Random Fields are a probabilistic framework for labeling and segmenting structured data, such as sequences, trees and lattices. This is especially useful in modeling time-series data where the temporal dependency can manifest itself in various different forms. The underlying idea is that of defining a conditional probability distribution over label sequences given a particular observation sequence, rather than a joint distribution over both label and observation sequences. The primary advantage of CRFs is the relaxation of the independence assumption. Independence assumption says that the variables don’t depend on each other and they don’t affect each other in any way. This is not always the case and it can lead to serious inaccuracies.

**HMM vs CRF**

HMM is a generative model and it gives the output directly by modeling the transition matrix based on the training data. The results can be improved by providing more datapoints, but there is no direct control over the output labels. HMM learns the transition probabilities on its own based on the training data provided. Hence if we provide more datapoints, then we can improve the model to include wider variety. CRF is a discriminative model which outputs a confidence measure. This is really useful in most cases because we want to know how sure the model is about the label at that point. This confidence measure can be thresholded to suit various applications. The good thing about confidence measure is that the number of false alarms is low compared to HMM.

The primary advantage of CRFs over HMMs is their conditional nature, resulting in the relaxation of the independence assumptions required by HMMs. Additionally, CRFs avoid the label bias problem, a weakness exhibited by Markov models based on directed graphical models. A CRF can be considered as a generalization of HMM or we can say that a HMM is a particular case of CRF where constant probabilities are used to model state transitions. CRFs outperform HMMs on a number of real-world sequence labeling tasks.

The theory of conditional random fields is too deep to be discussed in just two blog posts. These posts are just meant to introduce you to the CRFs. They are very useful when you are dealing with temporal data. There are many libraries available out there like HCRF, CRFall, CRF++ etc, that have CRF functionalities nicely defined and implemented. You can check them out and see how they work out for your project.

————————————————————————————————-

Great post and blog! Can you explain what is the difference between a markov random field and CRF. So, assuming a typical image processing sort of problem where the image pixels are modelled as random variables. I can completely see why one would use a CRF as typically one is interested say in a particular labelling given the image for ex: in a segmentation one would want he labels using intensity as features and we are interested in modelling P(Y|I) where Y are the labels and I are the intensities at each pixel.

From what I have seen, MRFs are considered generative model, so they are modelling the joint distribution P(Y, I). However, I have a confusion understanding this. How would this be graphically achieved in an MRF? Do we have MRF sites for both the output variables and the input variables and the idea is to define potential functions which connect them with each other. So, we need to model the dependencies between the output and input variables, which I guess would be complex…Can a naive Bayes model be translated to an MRF?

Thanks!