The advent of social media, online surveys, feedback forms and other electronic records means that we can now quickly collect massive amounts of customer experience data. The challenge for businesses is that the responses, comments and other feedback are usually in the form of unstructured text (emails, texts, tweets, Facebook comments, reviews and so on). A 2013 study by Krikorian estimated that worldwide there are more than 500 million tweets per day, while according to Facebook 1.18 billion people were active on Facebook every day in September 2016. That is a significant number of people either reading or writing unstructured text on a single platform every day. The ‘firehose’ is gushing out larger volumes of unstructured text per year.

Researchers, academics and business people ask: ‘How can we understand what is being said in all this data?’ They care about the hidden insights in this unstructured text because it contains valuable information that, if properly analysed, can improve their products and business. From unstructured text you can learn more about your customers, find new ones, obtain greater insight into your competitors and markets and develop new strategies for future growth.

Codifying text, the traditional method used to understand unstructured data, is still valid, but it is increasingly strained by the size and complexity of the unstructured text that is now available. In a survey with open ended questions this involves: reading all the responses to understand the range of topics; creating a codebook; reading each response; manually applying codes for each response; then performing analysis on those codes. Attempting to codify comments or analyse these responses using a manual approach can lead to lost insights and a class of ‘factory workers’ slaving through millions of messages. This is where text analytics tools excel.

There is no doubt, the key to success lies in the ability to better understand and act upon customers’ needs. Leading companies like Ford Motor Company, General Electric and Bank of America build competitive The-Color-In-Your-Store-And-Its-Relation-To-Your-Salesstrategies based on insights from ‘voice of the customer’ (VoC) data. Companies like JetBlue, Kimberley Clarke, and Goodyear use text analytics tools to understand their unstructured data holdings.

These tools help organisations understand customer preferences, perceptions and needs, offering the ability to quickly identify and extract topics, opinions and sentiment, then classify responses according to pre-established themes. Text analytics software can connect unstructured data ( and structured data (e.g. ratings) or link VoC data and metrics to other business data and metrics.

Various text analytics tools are available with different programs taking different approaches. IBM has a text analytics module for SPSS called Text Analytics for Surveys. There are the Provalis Research products:
QDA Miner and WordStat for Stata (awarded KMWorld’s 2015 Trend-Setting Product for its combined numerical and text analytics capabilities). SAS has SAS Text Analytics consisting of several components (SAS Text Miner, SAS Enterprise Content Categorisation, SAS Sentiment Analysis and SAS Ontology Management).

Text analytics tools, although useful for performing qualitative coding, do not just perform word counts or generate histograms of word frequency. These tools can be used to understand how customers are linking
words (e.g. does the word ‘dirty’ appear next to or near the word ‘bathroom’) in customer comments and to identify whether there is a negation of a seemingly positive comment. The tools also allow linkage back to specific locations, time of comment, and other structured data (age, demographic, gender, etc.) captured in responses.

The 2005 hotel feedback study ‘Let Me Count the Words’ by Pullman, McGuire and Cleveland quantified the qualitative information in open-ended questions. By linking customer satisfaction Likert-type scale responses with the emotion and insights in the open-ended questions, the study demonstrates the value of the open-ended question when understanding customer feedback.

However, the tools are only part of the solution. Practitioners require a methodological and approach to analyse the unstructured data. These steps should include:
1. Develop a hypothesis / objective for the analysis.
2. Identify data and gather it into a suitable dataset.
3. Perform an initial analysis such as a bubble chart showing word frequency, clustering and word proximity.
4. Consider establishing and applying a dictionary, thesaurus, stemming or lemmatisation to capture synonyms and similar words. These can then be re-used for the same topic.
5. Consider phrases and compound nouns by reviewing word correlation and recognising the differences between some potentially strongly correlated words and genuine phrases. For example ‘air conditioner’ needs to be recognised as a compound noun, while ‘dirty bathroom’ needs to stand on its own as two notable correlated words needing investigation.
6. Link these insights to quantitative variables such as gender, location, age, satisfaction rating, etc.

Companies are using text analytics tools to enhance customer experience, customer retention, risk management, quality control, cost savings and efficiency. Organisations use text analytics software to analyse small and extremely large volumes of unstructured text, sourced in almost any format. They need these tools because data has become too large and complex for individuals (or even teams) to manually examine, analyse and interpret. They want fast and reliable insights from their unstructured text.

Note: This article first appeared in the Summer edition of the Australian Market and Social Research Society publication Research News


Krikorian, R., 2013. New Tweets per second record, and how! [Online][Accessed 7 November 2016].

Facebook Inc., 2016. Company Information Statistics – 4 November 2016. [Online] [Accessed 7 November 2016].

McKellar, H., 2015. KM World Trend-Setting Products of 2015. KM World, 1 September, 24(8).

Pullman, M., McGuire, K. & Cleveland, C., 2005. Let Me Count the Words: Quantifying Open- Ended Interactions with Guests. Cornell Hotel and Restaurant Administration Quarterly, 46(3), pp. 323-343.

See it live

Interested in QDA Miner and WordStat? Register for one of our webinars!

Webinar Registration