“For those who have ever needed to find themes or relationships in verbatim responses, focus group transcripts, or other text sources, WordStat is very attractive indeed.”

— Marketing Research

LIST OF FEATURES

TEXT PROCESSING CAPABILITIES

  1. Content analysis on collections of ANSI or RTF document and short alphanumeric variables.
  2. Stemming in 18 languages.
  3. Dictionary moderated lemmatization and stemming (English, French, Italian, German and Spanish; contact us for other languages).
  4. Ability to call external text pre-processing EXE or DLL.
  5. Optional exclusion of pronouns, conjunctions, etc, by the use of user-defined exclusion lists (or stop list).
  6. Categorization of words or phrases using existing or user-defined dictionaries.
  7. Word categorization based on Boolean (AND, OR, NOT) and proximity rules (NEAR, AFTER, BEFORE).
  8. Word and phrase substitution and scoring using wildcards and weighting.
  9. Frequency analysis on keywords, phrases, derived categories or concepts, or user-defined codes entered manually within a text.
  10. Interactive development and easy maintenance of hierarchical dictionaries, taxonomies, or categorization schema.
  11. Drag and drop editor for easy assignments of words, phrases into categories!
  12. Ability to restrict the analysis to specific portions of a text or to exclude comments and annotations.
  13. Ability to perform an analysis on a random sample of cases.
  14. Integrated spell-checking with support for more than 20 languages such as English, French, Spanish, etc.
  15. Integrated thesaurus to assist the creation of taxonomies and comprehensive categorization schemes (English, French, Spanish, Italian, Portuguese and German).
  16. Powerful case filtering on any numeric or alphanumeric field and on code occurrence (with AND, OR, and NOT Boolean operators)
  17. Prints presentation quality tables
  18. Imports ANSI and Unicode text files, MS Word, RTF and HTML, PDF.
  19. Exports any table to Excel, SPSS, Stata, ASCII, Tab separated or comma separated value files, or HTML files.
  20. Flexible keyword highlighting (the text editor can display all categories using different colors).

Keyword highlighting

UNIVARIATE KEYWORD FREQUENCY ANALYSIS

  1. Univariate word frequency analysis (word or category count and record occurrence).
  2. Word x word co-occurrence matrix.
  3. Word x case data matrix.
  4. Integrated multidimensional scaling with 2D and 3D maps.
  5. Proximity plot.

Word frequency table Co-occurrence matrix Multidimensional scaling (2D Map)
Multidimensional scaling (3D Map) Proximity plot Pie chart

FEATURE EXTRACTION

  1. Topic modeling tool automatically extract topics by applying factor analysis on word x segment matrices.
  2. Vocabulary finder extracts technical terms, product and company names as well as common misspellings.
  3. Pattern based named-entity extraction.
  4. Phrase finder allows one to easily identify recurring phrases and expressions

Vocabulary and phrase finder

NORM CREATION AND COMPARISON

  1. Ability to create norm files based on frequency analysis of words or content categories.
  2. Comparison of obtained frequencies to previously saved norm files.

KEYWORD RETRIEVAL FUNCTION

  1. A powerful keyword retrieval function allows identification of text units (documents, paragraph or sentences) containing one keyword or a combination of keywords with optional filtering of cases.
  2. Ability to attach QDA Miner codes to retrieved segments.
  3. Retrieved segments may be exported to disk in tabular format (Excel or delimited text files) or as text reports (Rich Text Format).

Keyword Retireval function Keyword Retireval function Keyword Retireval function

KEYWORD CO-OCCURRENCE ANALYSIS

  1. Integrated clustering and dendrogram display of keyword co-occurrence.
  2. First- and second-order proximity analysis.
  3. Proximity plot to easily identify all keywords that co-occurs with a target keyword.
  4. 2D and 3D multidimensional scaling on either joint frequency or co-occurrence of words or categories.
  5. Flexible keyword co-occurrence criteria (within a case, a sentence, a paragraph, a window of n words, a user-defined segment) as well as clustering methods (first- and second-order proximity, choice of similarity measures).
  6. Easy text retrieval from dendrogram or proximity plots.

ANALYSIS OF CASE OR DOCUMENT SIMILARITY

  1. Hierarchical clustering, multidimensional scaling and proximity plot may be used to explore the similarity between documents or cases.

MULTIPLE RESPONSES AND COMPARISONS

  1. Can perform univariate frequency analysis and crosstabulation on information stored in several alphanumeric fields (memo or string variables).
  2. Comparison of keyword occurrence between different fields.
  3. Computes inter-raters agreement measures (pct. of agreement, Cohen’s Kappa, Scott’s Pi, Krippendorff’s R and r-bar, free marginal) based on codes manually entered in different variables.

BIVARIATE COMMPARISONS BETWEEN SUBGROUPS

  1. Bivariate comparison between any textual field and any nominal or ordinal variable (such as the sex of the respondent, specific subgroups, years of publication, etc.).
  2. Choice between 11 different association measures to assess the relationship between word occurrence and nominal or ordinal variables (Chi-square, Likelihood ratio, Tau-a, Tau-b, Tau-c, symmetric Somers’ D, asymmetric Somers’ Dxy and Dyx, Gamma, Person’s R, Spearman’s Rho)
  3. Computation statistics on either absolute or relative frequency
  4. Ability to sort matrix in alphabetic order of words, by word frequency or word occurrence, on the obtained statistics or on its probability.
  5. Visually compare items between subgroups using bar charts and line charts.

Bivariate Comparison  Bivariate comparison using bar chart (3D) Bivariate comparison using line chart

  1. Correspondence analysis (statistics, 2D & 3D joint plots). This feature is accessible from the crosstab page and allows one to see graphically the relationship between nominal variables and codes resulting from a content analysis.
  2. Heatmap plot (with dual-clustering of keywords and variables)

Correspondence Analysis Correspondence Analysis in 3D Heatmap

AUTOMATED TEXT CLASSIFICATION

  1. Machine learning algorithms (Naive Bayes and K-Nearest Neighbors) for document classification.
  2. Flexible feature selection for automatic selection of best subsets of attributes.
  3. Numerous validation methods (leave-but-one, n-fold crossvalidation, split sample).
  4. Experimentation module allows easy comparison of predictive models and fine-tuning of classification models.
  5. Classification models may be saved to disk and applied later using either a standalone document classification utility program, a command line program or a programming library . Note: The command line and the programming library are part of WordStat Software Developer’s kit (SDK) which is sold separately.

Automatic Document Classification - feature selection Automatic Document Classification - Test page Automatic Document Classification - History Automatic Document Classification - Text Classification

KEYWORD-IN-CONTEXT (KWIC)

  1. Ability to display a KWIC table to examine the textual context of a word, word pattern, or category.
  2. Ability to sort the table on any independent (numeric) variables.
  3. Ability to jump from a KWIC keyword to the textual variable in order to view or edit the original text.
  4. KWIC list can be saved in data files for further processing.
  5. Customizable KWIC display (paragraph, sentence or user defined segment).
  6. Concordance report (displays all hits as a list of paragraphs, sentences or user defined segments)

Keyword-in-Context (KWIC)

FULL INTEGRATION WITH A STATISTICAL SOFTWARE

  1. Alphanumeric variables can be stored in the same file as all other numeric variables.
  2. Variable selection, statistical analysis and content analysis are performed within the same application program.
  3. Matrix outputs are automatically added to existing statistical outputs.
  4. New variables representing occurrence of words, keywords or concepts can be added to the existing data file or exported to a new data file in order to be submitted to further statistical analysis (such as cluster analysis on words or cases, principal coordinate analysis, correspondence analysis, multiple regression, etc.).
  5. Data can be imported from and exported to different file format including dBase, Paradox, Excel, Quattro Pro, Lotus 1-2-3, SPSS for DOS, SPSS for Windows, comma or tab separated text files, etc.
  6. Ability to perform numeric and alphanumeric transformation or to apply filters on records of the data file to restrict the analysis to specific subgroups. .

UTILITY PROGRAMS

  1. Dictionary building assistant to find related words (synonyms, antonyms, holonyms, meronyms, hypernyms, hyponyms) in a WordNet based thesaurus (English only). (100,000 synonyms, 120,000 root words)

Dictionary building assistant Dictionary building assistant Dictionary building assistant Dictionary building assistant

  1. WS Document Classifier, a small standalone application to apply previously saved categorization and classification models to external documents.
  2. Document Conversion Wizard- Utility program to easily import documents. Various file formats may be directly imported such as Plain text (ANSI, Unicode) HTML, RTF, MS Word, WordPerfect, Adobe PDF
  3. Optional removal of leading and trailing spaced and hard returns.
  4. Extraction of numeric, alphanumeric and date variables from structured documents.
  5. Extraction options may be saved on disk and later retrieved.
  6. Documents may be stored as plain ANSI text or as RTF documents.

Share this page