The new features of WordStat text mining software
What’s New in Versions 7.1 and 7.0?
The new version 7 delivers numerous improvements allowing users to get:
• Valuable and actionable insights from text data more quickly.
• Faster and easier comparisons between unstructured and structured information.
• Greater assistance for the creation and validation of accurate text-categorization dictionaries.
• Faster analysis of larger amounts of text data.
• A GIS mapping and data editing tool to quickly relate unstructured text with geographic information and create maps.
You will find below a complete list and detailed information of the new features of WordStat 7.
Changes in Version 7.1?
The version 7.1 features a new GIS mapping and data editing module that allows one to relate unstructured text data with geographic information and create maps and other graphic displays for analysis and presentation such as:
1. INTERACTIVE PLOT OF DATA POINTS
Plots of data points can be created from words, phrases, or topics extracted from unstructured text fields. One can quickly filter data points on categorical, numerical, and date variables or create dynamic range displays and custom animations to easily identify temporal trends, cyclical patterns or relationships to numerical variables. One may also customize and annotate single data points
2. DISTRIBUTION MAPS
Users can create layers from various vector file formats and produce choropleth maps to represent point density, demographic information stored in shapefiles, or statistical summaries of numerical values associated with text segments. One can also easily adjust the color range, the number of steps and level of transparency.
Density of data points can be visualized with heatmaps displays to easily identify customer concentrations, crime hot spots, or disease outbreaks. Users can also create heatmaps on all data point or just on selected regions and choose from a wide variety of color ramps or create their own.
Integrated geocoding service is available in WordStat to tranform references to cities, states, provinces, countries, postal codes, and IP addresses into geographical coordinates.
5. OTHER GIS FEATURES
Natively opens and displays a wide range of vector, image, grid, and SQL database layer formats, including advanced spatial server geodatabases. WMS, WFS, and WMTS mapping services.
Comprehensive visual layer property, legend, and scale controls provide for deep customization of the map appearance.
Create, edit, translate, and export map layers in a number of vector, image, grid, and database formats.
Support for coordinate systems with on-the-fly layer reprojection between thousands of predefined geographic and projected coordinate systems or any coordinate system defined from 150+ projections and 900+ datums.
Saving of maps to industry standard graphic file formats (BMP, PNG, JPG) and georeferenced world files as well as AVI movie files.
Changes in Version 7.0?
1. TOPIC EXTRACTION TOOL
A new topic modelling tool based on factor analysis has been implemented to quickly extract topics from large collections of documents. Obtained topics may be renamed, merged, or deleted. A side panel also allows one to compare the frequency of specific topics across other variables using bar charts or line charts.
2. LINK ANALYSIS FEATURE
A new Link Analysis feature allows one to display co-occurrence data using force-based graphs, multi-dimensional scaling or circular graphs. Graphs are interactive and may be used to explore connections and to retrieve text segments associated with specific connections.
3. NAMED ENTITY EXTRACTION
A new pattern-based named entity extraction feature has been added. Extracted names may be added to the categorization dictionary using drag-and-drop operations.
4. IMPROVED DENDROGRAM PAGE
When clustering keywords or content categories, a new panel on the right of the dendrogram displays the frequency distribution of the selected cluster across up to two independent variables as well as a link chart.
5. MORE INTELLIGENT HANDLING OF MISSPELLINGS
Misspellings and unknown words are now automatically matched with existing entries in the user dictionary and may be quickly added to such dictionary. The redesigned interface also identifies potential replacements as well as possible misspellings of words that are part of phrases currently in the categorization dictionary.
6. IMPROVED KEYWORD-IN-CONTEXT FEATURE
The KWIC (Keyword-in-Context) page now includes a tree view of the keyword contextual data sorted in descending order of frequency. The tree view may be used to easily filter and navigate through long concordance lists.
7. IMPROVED DRAG-AND-DROP EDITING
One can now drag suggested words (Frequencies page) and overlapping phrases (Phrase Finder page) directly from the right-most panels to the dictionary panel (left-most panel).
9. MORE POWERFUL PROXIMITY RULES
The Rule Editor now supports up to four conditions, and each of those conditions can use a different distance setting in terms of units (document, paragraph, sentence, etc.) and physical distance (number of words).
10. STEMMING IN 18 LANGUAGES
Fast stemming has been implemented for 18 languages (English, French, Spanish, Basque, Catalan, Czech, Italian, German, Danish, Dutch, Finnish, Hungarian, Norwegian, Portuguese, Romanian, Russian, and Swedish)
11. VIEW AND EDIT THE AUTOMATIC REPLACEMENT LIST
One can now review the automatic word replacement list, edit entries, as well as import and export this list to disk, allowing one to share the list of replacements with other users or to move it to another computer.
12. LOG OF CHANGES IN DICTIONARIES.
A log of all changes made to categorization dictionaries and exclusion lists is now stored on disk. This feature may be disabled, if necessary.
13. IMPORT AND EXPORT CATEGORIZATION DICTIONARIES
Dictionaries may now be imported from, or exported to Excel, tab or comma-delimited files, and XML files.
14. SPEED IMPROVEMENTS
Several speed improvements have been made. For example, the phrase extraction tool is now from five to 20 times faster, and computing a KWIC list on large data sets, which used to take several minutes to extract, now takes a fraction of a second.
15. ADD NOTES TO DICTIONARY ENTRIES
Up to six types of notes can now be attached to categorization dictionaries. One may differentiate comment types by using various colors and customizable labels.
16. CROSSTAB ANALYSIS ON CLUSTERS AND PHRASES
New buttons on the Dendrogram and Phrase Finder pages allow one to access the Crosstab dialog box and perform comparison analysis on either extracted phrases or clustering solutions. One may then obtain various association statistics (chi-square, F-test, Person’s R, etc.), create bar charts, bubble charts or heatmaps, and perform a correspondence analysis.
17. IMPROVED AUTOMATIC DOCUMENT CLASSIFICATION
The Automatic Document Classification module has been moved to its own page and a new accuracy measure for ordinal predictions has been added, allowing one to optimize classification models on ordinal data. One may also edit values of the predicted variable from the Review Errors page, allowing one to correct misclassified cases in the learning data set.
18. IMPROVED MEMORY MANAGEMENT
WordStat now processes more text data in memory and automatically switches to disk when needed, resulting in faster processing of very large text collections.
19. SUPPORT OF NEW WILDCARDS IN DICTIONARY ENTRIES
Dictionary entries may now contain the # wildcard to represent numerical digits and the square brackets similar to those found in regular-expression engines for matching one character out of a set of user-defined characters.
20. IMPROVED SAVING OF CLUSTERING SOLUTIONS
When converting cluster solutions to a categorization dictionary, one can now select clusters based on the number of items (removing clusters containing a small number of words). Clusters are now automatically provided descriptive names.
21. IMPROVED SORTING OF DICTIONARIES
One can now sort dictionaries on items only, without affecting the order of content categories.
22. CUSTOMIZABLE TEXT REPORT
The text report for coded segments can now be customized, allowing the user to choose which information this report will include.
23. MULTIPLE SELECTIONS OF DICTIONARY ITEMS
On the main Dictionaries page, it is now possible to select multiple items in the categorization dictionary using the Shift or Ctrl keys and either move, edit or delete those items.
24. IMPORT SETTINGS FROM ANOTHER PROJECT
It is now possible to import analysis options (including dictionary settings, processing and charting options) from another project file, using the IMPORT SETTINGS command.
25. NEW 3D BAR CHART
In the CROSSTAB page, users can now choose between two types of 3D bar charts: a 3D clustered or 3D columns bar chart.
26. NEW BOTTOM AXIS LABELS DISPLAY FORMATS
Labels on the bottom axis of charts may now be printed at a 45-degree angle, vertically, or horizontally, on a single line or on two lines.