What is Machine Learning?

Machine learning is a very hot buzzword these days that mostly crops up when you read about Artificial Intelligence and Data Mining. At the first glance it may sound complicated and scary! You don’t have to look farther than the top Internet search hits you receive which will most likely direct you to bunch of articles full of hardcore math equations. But, what is this monster all about?

Machine learning is a sub-field of artificial intelligence in computer science and basically works around the idea of giving a computer tons of data, lock the door, leave it alone and have it identify topics in large data sets. The poor computer should also be able to make predictions on the new data. Sounds like fun right?

But how does a computer (a machine) learn from data?  It doesn’t have a real brain. Maybe not but what it doesn’t have in grey matter it makes up for in hard, fast work. One of the main tricks behind the machine learning is the iterative process. So, a model is built and updated through an iterative process, trying to improve predefined criteria. Therefore, when models are exposed to new data they can adapt independently, because they have “learned” from the past!


Cool, you might not be aware but you are using machine learning in your day-to-day activities! Some examples are: the friends suggested to you on social networks like Facebook or LinkedIn, the movie recommendations on Netflix, spam detection on your email account, unlocking your iPhone with a fingerprint, license plate recognition at tolls, mobile check deposits, and so on. All of these are examples of machine learning. Although there has been a rapid progress in machine learning, some of its possible applications still sound like the stuff of science fiction, such as an intelligent humanoid robot ready to talk about anything and have a beer with you!! Now if he could just buy the beer…

At Provalis Research we use supervised and unsupervised machine learning in our QDA Miner and WordStat qualitative analysis and text mining software. The automated text categorization module in WordStat allows one to apply either Naive Bayes or K-Nearest Neighbors learning algorithms on an existing textual database in order to develop a categorization model (or classifier). WordStat also provides features to test the performance of the learning algorithm and to optimize the various metrics.

In the next post, we will discuss some more cool stuff about machine learning.