Text Analytics: Build or Buy?

Build or buy? This is a question we run into often. Why should I buy your text mining product or something similar from one of your competitors when I can build it myself? Is there an easy answer to this question? Maybe not easy, but we think once you look at the different factors in choosing a text analytics solution, for most companies, one answer rises above the others. However, it isn’t an open and shut case. There are clearly pros and cons to consider with whatever path you choose.

Issues to Consider

Let’s take a look at some of the issues.

First, we need to recognize that this type of decision is one we make all the time albeit often on a very basic level. Do we buy our own groceries and cook dinner or buy pre-made meals? Do we do our own housecleaning or hire a service? Do we build your own CRM or buy from a well-known provider?

All of these decisions have several things in common; cost, skills, time, quality, complexity, security and added value. In the case of text analysis, you also have to think of file formats, size of data, training, continuing development, maintenance and security.

Build It?

Every company these days has lots of data it would like to analyze and probably lots of data it needs to analyze just to stay competitive. But not many companies don’t have the money and skills it requires to build a robust text analysis system. It doesn’t come cheap. According to the U.S. Bureau of Labor Statistics the average salary of a data scientist is $100k. You will need more than one. It will also take time to build a system from scratch using open-source tools and original programming. Then you will have to test it to make sure it works and it is what you need. It will need to provide the flexibility to work across your work across different business units and be easily understood by different users. Will it be required to be multilingual? What about security? There are known security risks of using open-source software, as well as legal ones.  Also, if the developer leaves the company will the knowledge be lost, or will someone else be able to carry on with the tool updating and improving it as required? Will you be able to put into place the training required to use your in-house tool? These are just a few of the build yourself questions to consider. A recent Forrester Blog suggests that while more companies will adopt this option many will fail because they cannot sustain in the long-term.

Buy It?

Buying removes many of these questions. You can shop for the capabilities and cost point that fits your specific needs. This can be an expensive, customized, automated system, a pay-per-use offering or a more reasonable DIY software solution. In all cases whatever you buy has likely been tested by hundreds or thousands of other customers. The vendor should have a support function in place and a training program complete with personal and online training, manuals and online help. The vendor should also be dedicated development, producing new version with updated features at regular intervals so your solution remains as close to state-of-the-art as possible. Using pre-built software will generally give you a more reliable, more cost-effective, and faster solution than building your own.

So, Should You Build It or Buy It? A Perspective From a Text Mining Expert and Open Source User

Here is a prospective from one of our customers who also uses open source tools. The comment is primarily targeted at researchers but can be extrapolated to large and small organizations. Dr. Derrick Cogburn is Professor of Information Technology & Analytics, Kogod School of Business, American University. Derrick has published multiple articles and conference papers on text mining. He is the President and CEO Managing Member of Praxis Analytic LLC offering consulting services in data and text analytics.

“While I consider myself a power user of the entire Provalis ProSuite, and absolutely love the software, it does have some limitations for me. I am a Mac user, and while I run bootcamp on my Mac and Provalis runs perfectly and fast, I do frequently need a cross-platform solution. Since I am also an R developer, in addition to Provalis, I use and teach text analytics in R using the RStudio integrated development environment (IDE). R, is a powerful data analytics language, and using the RStudio IDE helps to facilitate development of specific project scripts using both base R and a range of relevant packages. These scripts help to make a project replicable, they can run on a distributed high-performance computing environment, and they are easily sharable. However, the learning curve and time commitment required for building these customized solutions is very high. I suspect the vast majority of researchers interested in text mining, but not willing to invest the time into learning to program in R, will benefit tremendously from the excellent collection of text mining and data visualization tools contained within the Provalis ProSuite.”

For additional point-of-view on this topic, you  can read this blog on the evolution of customer insights in companies, as well as this other one on whether you should build, buy, partner or acquire a solution when it comes to developing your data analytics capabilities.