Clustering – Perpetual Enigma

Let’s say you get a whole bunch of data samples and you want to do some analysis about the underlying structure of those samples. You know that they can be categorized into certain groups, but you are not exactly sure what those categories are. For example, you get the data associated with shopping behavior of consumers. You want to understand what products are more popular, what kind of consumers buy these products, what time of the year do consumers buy more, etc. In order to divide this data into subgroups, we need to know what those groups should be in the first place. In our case, we don’t! This becomes increasingly difficult as you get more samples, often ranging in hundreds of thousands. So how do we analyze this data? How do we make the machine automatically learn the underlying structure and categorize accordingly? Continue reading “What Is K-Means Clustering?” →