Content analysis and text mining software for fast and precise processing of large amounts of unstructured information


Content analysis and text mining software for fast and precise processing of large amounts of unstructured information

previous arrow
next arrow


With WordStat, Data Analysts can quickly extract valuable text analytics results from large collections of documents such as customer feedback, emails, open-ended responses, interview transcripts, incident reports, patents, legal documents, blogs, websites, and more. Here is a list of content analysis and text mining features of WordStat:

Import from many sources

WordStat allows you to directly import content in multi-languages from many sources:

  • Import documents: Word, PDF, HTML, PowerPoint, RTF, TXT, XPS, ePUB, ODT, WordPerfect.
  • Import data files: Excel, CSV, TSV, Access
  • Import from statistical software: Stata, SPSS
  • Import from social media: Facebook, Twitter, Reddit, YouTube, RSS
  • Import from emails: Outlook, Gmail, MBox
  • Import from web surveys: Qualtrics, SurveyMonkey, SurveyGizmo, QuestionPro, Voxco, triple-s
  • Import from reference management tools: Endnote, Mendeley, Zotero, RIS
  • Import graphics: BMP, WMF, JPG, GIF, PNG. Automatically extract any information associated with those images such as geographic location, title, description, authors, comments, etc. and transform those into variables
  • Import from XML databases
  • ODBC database connection is available.
  • Import projects from qualitative software: NVivo, Atlas.ti, Qdpx files
  • Import and analyze multi-language documents including right-to-left languages
  • Monitor a specific folder, and automatically import any documents and images stored in this folder or monitor changes to the original source file or online services.

Import documents from many sources

Organize your data

Several features allow you to easily organize your data in ways that make your analysis process straightforward:

  • Quickly group, label, sort, add, delete documents or find duplicates.
  • Assign variables to your documents manually or automatically using the Document Conversion Wizard, ie: date, author, or demographic data such as age, gender, or location.
  • Easily reorder, add, delete, edit, and recode variables.
  • Filter cases based on variable values.

Quickly extract meaning using Explorer Mode

Quickly and easily extract meaning from large amounts of text data using Explorer mode, specially made for those with little text mining experience.

Identify the most frequent words, phrases, and extract the most salient topics in your documents with the topic modeling tool. At any time, you can switch to Expert mode which gives you access to all WordStat’s features.

Documents conversion wizard

Explore document content using Text Mining

In a few seconds, explore the content of large amounts of unstructured data and extract insightful information:

  • Extract the most frequent words, phrases, expressions.
  • Quickly extract themes using clustering or 2D and 3D multidimensional scaling on either words or phrases.
  • Easily identify all keywords that co-occur with a target keyword by using the Proximity Plot.
  • Explore relationships among words or concepts with the Link Analysis feature.
  • Fine-tune the analysis by applying the keyword co-occurrence criterion (within a case, a sentence, a paragraph, a window of n words, a user-defined segment) as well as clustering methods (first and second-order proximity, choice of similarity measures).
  • Explore the similarity between concepts or documents using hierarchical clustering, multidimensional scaling, link analysis, and proximity plot.

WordStat: cluster analysis on text data WordStat: proximity plot

Use Topic Modeling to extract the most salient topics

Get a quick overview of the most salient topics from very large text collections using state-of-the-art automatic topic extraction by applying a combination of natural language processing and statistical analysis (NNMF or factor analysis) not only on words but also on phrases and related words (including misspellings).

While in hierarchical cluster analysis, a word may only appear in one cluster, topic modeling may result in a word being associated with more than one topic, a characteristic that more realistically represents the polysemous nature of some words as well as the multiplicity of contexts of word usages.

WordStat: topic modeling

Explore connections

Explore connections among words or concepts using a network graph. Detect underlying patterns and structures of co-occurrences using three layout types: multidimensional scaling, a force-based graph, and a circular layout.

Graphs are interactive and may be used to explore relationships and to retrieve text segments associated with specific connections.

WordStat: link analysis

Relate text with structured data

Explore relationships between unstructured text and structured data:

  • Identify temporal trends, differences between subgroups, or assess relationships with ratings or other kinds of categorical or numerical data with statistical and graphical tools (deviation table, correspondence analysis, heatmaps, bubble charts, etc.).
  • Assess the relationship between word occurrence and nominal or ordinal variables using different association measures: Chi-square, Likelihood ratio, Tau-a, Tau-b, Tau-c, symmetric Somers’ D, asymmetric Somers’ Dxy and Dyx, Gamma, Person’s R, Spearman’s Rho.

WordStat: crosstab to analyze the relationship between structured and unstructured data WordStat: correspondance analysis

Categorize your text data using dictionaries

Achieve full-text analysis automation using existing dictionaries or create your own categorization model of words and phrases.

In the dictionary, one can implement Boolean (AND, OR, NOT) and proximity rules (NEAR, AFTER, BEFORE) and use Regular Expression formulas to quickly extract specific information from text data.

Dictionary moderated lemmatization and stemming are available in several languages and an automatic word substitution option allows you to substitute several words with a target keyword. A user-defined list of stop words is available in several languages to avoid nonessential frequent words such as he, she, it, etc in the analysis.

WordStat: categorization dictionary

Get unique assistance for dictionary building

Get truly unique computer assistance for taxonomy building with tools for extracting common phrases and technical terms and for quickly identifying in your text collection misspellings and related words (synonyms, antonyms, holonyms, meronyms, hypernyms, hyponyms).

WordStat: unique assistance for categorization dictionary building

Automatically classify your text data using machine learning

Develop and optimize automatic document classification models using Naïve Bayes and K-Nearest Neighbours. There are numerous validation methods that users can select: leave-but-one, n-fold cross-validation, split sample. An experimentation module can be used to easily compare predictive models and fine-tune classification models.

Classification models may be saved to disk and applied later in QDA Miner, in a standalone document classification utility program, a command-line program or a programming library.

WordStat: Automatically classify your text data using machine learning

Return to the source document in one click

Verify or dig deeper into your analysis by going back to the text from almost any feature, chart, or graph using Keyword Retrieval or Keyword-in-Context to retrieve sentences, paragraphs, or whole documents. This is particularly helpful when building taxonomies or for word-sense disambiguation.

The retrieved text segments can be sorted by keyword or any independent variable. You can attach QDA Miner codes to retrieved segments or export them to disk in tabular format (Excel, CSV, etc.) or as text reports (MS Word, RTF, etc.).

WordStat: Keywords in Context (KWIC)

Perform qualitative coding

Combine WordStat with a state-of-the-art qualitative coding tool (QDA Miner), for more precise exploration of data or a more in-depth analysis of specific documents or extracted text segments when needed.

Perform qualitative coding from WordStat

Transform unstructured text into interactive maps (GIS mapping)

Relate unstructured text data with geographic information and create interactive plots of data points, thematic maps, and heatmaps, along with a geocoding web service for transforming location names, postal codes and IP addresses into latitude and longitudes.

WordStat: GIS Viewer

Automatically extract names and misspellings

Automatically extract named entities (names, technical terms, product and company names) that can be added to the categorization dictionary using an easy drag-and-drop-operation.

Misspellings and unknown words are automatically extracted and matched with existing entries in the user dictionary and may be quickly added to the dictionary.

WordStat: Extract named entities and misspellings

Export results

Export text analysis results to common industry file formats such as Excel, SPSS, ASCII, HTML, XML, MS Word, to popular statistical analysis tools such as SPSS and STATA and to graphs such as PNG, BMP, and JPEG.

WordStat: proximity plot

Transform text using Python scripts

Use Python script and its full range of open-source libraries to preprocess or transform text documents for analysis in WordStat.