Content analysis and text-mining tool for Stata
Stata is a complete, integrated statistical software package created by StataCorp LP (www.stata.com). It provides a wide range of statistical analysis, data management, and graphics. The latest versions of Stata added many new features, including a long string data type allowing one to store along with numerical and categorical data, documents up to 2 billion characters. One could thus create a statistical database with journal abstracts, news transcripts, patents, incident reports, customer feedback, interviews, and so on.
WordStat for Stata was created to allow Stata 13 and Stata 16 users running under Windows, to apply text analytics techniques on any string variables stored in a Stata data file. WordStat combines natural language processing, content analysis, and statistical techniques to quickly extract topics, patterns, and relationships in large amounts of text. It can process millions of words in seconds and compare extracted themes across any other numerical, categorical, or date variables in the Stata file.
What it is used for?
WordStat can be used by anyone who needs to quickly extract and analyze information stored in Stata text variables. It may be used for:
• Directly import text and quantitative data from social media, online survey platforms, reference management tools
• Content analysis of open-ended responses, interview or focus group transcripts
• Business intelligence and competitive web sites analysis
• Information extraction and knowledge discovery from incident reports, customer complaints
• Content analysis of news coverage or scientific literature (scientometrics or bibliometrics studies)
• Automatic tagging and classification of documents
• Fraud detection, authorship attribution, patent analysis
• Taxonomy development and validation
• Etc. (for some examples of studies using WordStat, see the Studies page).