An analysis of Twitter Data Using WordStat
More and more researchers are using social media as a means to try to understand or explain the impact of events and social policy on society at large or on different targeted groups. We know anecdotally, personally and from media reports the Covid-19 pandemic has had different effects on various groups whether that be the elderly, people in assisted living, the poor, health care workers, etc. To get a representative assessment of a particular group or groups of people often involves collecting and analyzing extremely large data sets. Provalis Research’s text mining software WordStat can be an extremely useful tool in exploring and finding meaning in very large amounts of text.
Covid-19 pandemic had different effects on various groups
In their paper Al-Rawi, Grepin, Li, Morgan, Wenham, and Smith (2021) examined the gender discourse around Covid-19. They looked at three gender categories: men, women, and sexual and gender minorities. (SGM) They used a mixed-methods approach that included topic modelling, sentiment analysis, and text mining extraction procedures including words’ mapping, proximity plots, top hashtags and mentions, and most retweeted posts. The authors initially collected more than 50 million tweets between February and April 2020. This represented more than 11 million unique users. They then searched the tweets for specific words related to the genders they wished to study, men, women and SGM. Much of this work was done with Python 3 scripts. Python and R can now be used with WordStat 9 for pre and post-processing. The researchers used WordStat’s topic modelling and other features such as proximity plots and network word mapping (link analysis) to further explore the tweets.
The authors used several of the graphic presentations available in WordStat to explore their data and to present it visually. The proximity plot and network mapping allowed the authors to view and present several interesting findings in their analysis. For example, when looking at the network word mapping (link analysis, Figure 1) it became apparent that words associated with “Medical bills” (the most tweeted topic) such as such as “bills,” “cost,” and “uninsured” are connected to “@abbyabrams a Time journalist. She wrote a story about uninsured women and Covid-19 which gathered wide attention and probably increased awareness about this issue, hence the discussion on social media.
Figure 1 – Network mapping of the most recurrent words and their associations for women (top) and men (bottom).
In the case of the proximity graph (Figure 2), there is a strong correlation between the words “girls” and the words “violence”, “crisis” and “@un”, which is reflected in the fifth most retweeted message, originally sent by UN Secretary General António Guterres.
Figure 2- The proximity graph of the words “girls”.
The study discovered several interesting areas to further explore including that discussion on men, women and SGM group issues were different and influencers on social media played an important role in health care communication. While this study is limited to twitter, the authors suggest additional research should include other social media including Facebook and Instagram.
Al-Rawi, A., Grepin, K., Li, X., Morgan, R., Wenham, C., & Smith, J. (2021). Investigating Public Discourses Around Gender and COVID-19: a Social Media Analysis of Twitter Data. Journal of Healthcare Informatics Research, 1-21.