WordStat 7 New Features
Provalis Research, a leading provider of text-analytics software, is pleased to announce the release of WordStat 7. This new version allows users to get valuable and actionable insights from text data more quickly, relating unstructured and structured information, and providing greater assistance for the creation and validation of accurate text-categorization dictionaries. WordStat 7 delivers numerous improvements, such as:
1. TOPIC EXTRACTION TOOL
A new topic modelling tool based on factor analysis has been implemented to quickly extract topics from large collections of documents. Obtained topics may be renamed, merged, or deleted. A side panel also allows one to compare the frequency of specific topics across other variables using bar charts or line charts.
2. LINK ANALYSIS FEATURE
A new Link Analysis feature allows one to display co-occurrence data using force-based graphs, multi-dimensional scaling or circular graphs. Graphs are interactive and may be used to explore connections and to retrieve text segments associated with specific connections.
3. NAMED ENTITY EXTRACTION
A new pattern-based named entity extraction feature has been added. Extracted names may be added to the categorization dictionary using drag-and-drop operations.
4. IMPROVED DENDROGRAM PAGE
When clustering keywords or content categories, a new panel on the right of the dendrogram displays the frequency distribution of the selected cluster across up to two independent variables as well as a link chart.
5. MORE INTELLIGENT HANDLING OF MISSPELLINGS
Misspellings and unknown words are now automatically matched with existing entries in the user dictionary and may be quickly added to such dictionary. The redesigned interface also identifies potential replacements as well as possible misspellings of words that are part of phrases currently in the categorization dictionary.
6. IMPROVED KEYWORD-IN-CONTEXT FEATURE
The KWIC (Keyword-in-Context) page now includes a tree view of the keyword contextual data sorted in descending order of frequency. The tree view may be used to easily filter and navigate through long concordance lists.
7. IMPROVED DRAG-AND-DROP EDITING
One can now drag suggested words (Frequencies page) and overlapping phrases (Phrase Finder page) directly from the right-most panels to the dictionary panel (left-most panel).
9. MORE POWERFUL PROXIMITY RULES
The Rule Editor now supports up to four conditions, and each of those conditions can use a different distance setting in terms of units (document, paragraph, sentence, etc.) and physical distance (number of words).
10. STEMMING IN 18 LANGUAGES
Fast stemming has been implemented for 18 languages (English, French, Spanish, Basque, Catalan, Czech, Italian, German, Danish, Dutch, Finnish, Hungarian, Norwegian, Portuguese, Romanian, Russian, and Swedish)
11. VIEW AND EDIT THE AUTOMATIC REPLACEMENT LIST
One can now review the automatic word replacement list, edit entries, as well as import and export this list to disk, allowing one to share the list of replacements with other users or to move it to another computer.
12. LOG OF CHANGES IN DICTIONARIES
A log of all changes made to categorization dictionaries and exclusion lists is now stored on disk. This feature may be disabled, if necessary.
13. IMPORT AND EXPORT CATEGORIZATION DICTIONARIES
Dictionaries may now be imported from, or exported to Excel, tab or comma-delimited files, and XML files.
14. SPEED IMPROVEMENTS
Several speed improvements have been made. For example, the phrase extraction tool is now from five to 20 times faster, and computing a KWIC list on large data sets, which used to take several minutes to extract, now takes a fraction of a second.
15. ADD NOTES TO DICTIONARY ENTRIES
Up to six types of notes can now be attached to categorization dictionaries. One may differentiate comment types by using various colors and customizable labels.
16. CROSSTAB ANALYSIS ON CLUSTERS AND PHRASES
New buttons on the Dendrogram and Phrase Finder pages allow one to access the Crosstab dialog box and perform comparison analysis on either extracted phrases or clustering solutions. One may then obtain various association statistics (chi-square, F-test, Person’s R, etc.), create bar charts, bubble charts or heatmaps, and perform a correspondence analysis.
17. IMPROVED AUTOMATIC DOCUMENT CLASSIFICATION
The Automatic Document Classification module has been moved to its own page and a new accuracy measure for ordinal predictions has been added, allowing one to optimize classification models on ordinal data. One may also edit values of the predicted variable from the Review Errors page, allowing one to correct misclassified cases in the learning data set.
18. IMPROVED MEMORY MANAGEMENT
WordStat now processes more text data in memory and automatically switches to disk when needed, resulting in faster processing of very large text collections.
19. SUPPORT OF NEW WILDCARDS IN DICTIONARY ENTRIES
Dictionary entries may now contain the # wildcard to represent numerical digits and the square brackets similar to those found in regular-expression engines for matching one character out of a set of user-defined characters.
20. IMPROVED SAVING OF CLUSTERING SOLUTIONS
When converting cluster solutions to a categorization dictionary, one can now select clusters based on the number of items (removing clusters containing a small number of words). Clusters are now automatically provided descriptive names.
21. IMPROVED SORTING OF DICTIONARIES
One can now sort dictionaries on items only, without affecting the order of content categories.
22. CUSTOMIZABLE TEXT REPORT
The text report for coded segments can now be customized, allowing the user to choose which information this report will include.
23. MULTIPLE SELECTIONS OF DICTIONARY ITEMS
On the main Dictionaries page, it is now possible to select multiple items in the categorization dictionary using the Shift or Ctrl keys and either move, edit or delete those items.
24. IMPORT SETTINGS FROM ANOTHER PROJECT
It is now possible to import analysis options (including dictionary settings, processing and charting options) from another project file, using the IMPORT SETTINGS command.
25. NEW 3D BAR CHART
In the CROSSTAB page, users can now choose between two types of 3D bar charts: a 3D clustered or 3D columns bar chart.
26. NEW BOTTOM AXIS LABELS DISPLAY FORMATS
Labels on the bottom axis of charts may now be printed at a 45-degree angle, vertically, or horizontally, on a single line or on two lines.
New features of WordStat 6 can be viewed here