Using quantitative content analysis to see beyond the veil by Peter Jacques, Professor in Global Environmental Politics, Sustainability at University of Central Florida.

Our Purpose

Shocker—politics is animated by people and institutions that are not always honest or forthright about their intentions.  Sometimes people hide their motives, and sometimes even the formal institutions (rules and decision-making procedures) are not always made to protect or create the things they say they are meant to protect or create.

As a political scientist trained in interdisciplinary global environmental politics (GEP), one of my goals as a researcher is to understand the way the world works, especially when leaders say one thing but mean another as it applies to ecological change. This work is urgent because global environmental changes are undermining human and non-human prospects—though we cannot authentically disentangle these from each other. As these critical life support systems change, we are presented with so many words—words of promise and rationality, at the same time that nearly all the Earth systems are shifting in the wrong direction during the Sixth Great Extinction, climate change, land and water changes, and changes to the World Ocean, among others. Schofer and Hironika[i] show that there has been such a growth in international organizations and agreements that it constitutes a “World Environmental Regime.” However, as Peter Dauvergne[ii], a cornerstone innovator of GEP, notes in a landmark book, Shadows of Consumption, even though there are more environmental protections around the world than ever before, structural problems persist even as they are discreetly shifted to less “visible” parts of the world. In short, we have conflicting signals—lots of new-ish rules (especially during the 20th century) that make environmental protection a priority, but the fact of continued extensive, and patently unsustainable, changes to Earth systems continues and is accelerating. What does all this mean?

In order to get to the heart of these politics, we cannot rely on the simple existence of rules, but rather we need to try to understand the real drivers of politics, not those on the façade, because clearly the rules are in contradiction to the ecological trends.

Quantitative content analysis (QCA hereafter) allows a way to see beyond the veil. My personal sensibility is that political science is at its best when it peers into the backrooms of the Godfather, to see what is really happening and expose the power it tries to conceal. As John G. Ruggie has put it, we need to understand the “generative grammar” of world politics.[iii]  Our paper, “SOFIA’S choices: Discourses, values, and norms of the World Ocean Regime” in Marine Policy used QCA to empirically discover the generative grammar of ocean politics.

Normally, power is not naked, but hidden, and this is where QCA comes into play for me. QCA is an important research method and technique that requires an enormous amount of front-end preparation time, even when using software like Provalis’ WordStat. However, it allows researchers to look ‘from afar’ and analyze the use and relationships of language without anyone knowing that the peer is upon them. This is not sinister, but is rather a simple use of the text that exists in public. No need for an Internal Review Board, usually this is text that is already available.

The Process

This brings us to the nature of the selection of data. When selecting text, two kinds of validity are important. First is prima facie validity—the text is a clear indication of communication from the right source on the face of it, or on first consideration. This is the first test for reviewers, does the text seem to represent the people the researcher says it does—it is initially convincing. Second, and more profoundly, the selection of text must have a substantial justification that demonstrates it is a valid example of sentiment on behalf of the speaker.. If the selection is not representative, the analysis is not meaningful. In our paper, we selected all of the existing State of the World’s Fisheries and Aquaculture (SOFIA) Reports from 1995-2016 to understand the nature of ocean politics because it is “the” place to go when you want to know something about world fisheries, and this selection must have made sense to reviewers because it was never questioned. We did, naturally, spend time on explaining why we thought this was a valid data source.

Once we had the data, we could not just hit “control F” and search through twenty years of oceanic discourse. We had to eliminate text in the data that were not of interest but could skew the results—removing much of the front matter, for example.

Still, the most demanding part of this work was developing a dictionary that we could trust. The bottom line is that this process requires multiple layers of validation.  This gets to Grimmer and Stewart’s first principle of automated content analysis—“All Quantitative Models of Language are wrong—but some are Useful”.[iv]  Developing a dictionary to search the SOFIA reports took a very long time to complete, and that was even with the help of an already published, relevant dictionary from Xu and Bengston.[v]  We used this dictionary as a starting point because their purpose was to measure values in forest management, and we wanted to understand norms in ocean politics; thus, it was a matter of removing forest references and adding relevant marine concepts.

We developed this dictionary through an open and iterative process which I think is important for QCA scholars to think about. This is because as we explore the performance of dictionary terms, we will learn more about our data. One important task in building the dictionary is making sure relationships are not lost simply because a synonym or some inflection of a term is missed. Thus, the thesaurus function in Wordstat was used to maximize the relevant key words, and syntax like the * were used to capture the varied uses of several terms. This means that when we search for a term such as “commerc*” we find “commerce,” “commercial,” “commercially,” “commercialization,” and “commercialized” are all retrieved. I believe explaining this aspect of your dictionary to reviewers is critical because if we don’t, smart people will conclude that the dictionary misses far too much.

The dictionary, as noted above, requires extensive validation. In our case, we were interested in certain categories of concern, such as ecological integrity, but it is not always clear that any particular word explicitly belongs to any one category; at the same time we don’t want one term to be in more than one category. This is where we developed coding expectations/rules, and then tested inter-coder agreement on a sample of these terms, using students from my undergraduate research group, The Political Ecology Lab at the University of Central Florida, as test coders.  Thankfully, the words and expectations were fairly self-evident and the reliability scores, using Cohen’s Kappa for nominal data, were substantial with four reviewers and near perfect when one outlying coder was removed.[vi]

As we built this dictionary, we also conducted two other validation tests called for by Laver and Garry (2000)[vii]—factor analysis and hand validation. Two kinds of validation were used – one to make sure our categories were internally coherent, and externally different than the other categories. For the first, we used Cosine Theta test of secondary relationships (how words are used as opposed to simple proximity to other words) to make sure that each of the words in each category were statistically related to each other. Then we used multidimensional scaling to visualize the proximate relations between the categories to make sure that there was no overlap. All of these tests are reported in the paper.

Finally, we used the Key Words in Context (KWIC) feature to hand validate terms. This is important and resulted in several opportunities for correction. For example, we wanted to know if the reports discussed crude oil spills, but when we searched “oil” we retrieved ”fish oil” and had to change the term to capture “oil spills.”

All of this work took a long time to complete, and it is clear that scholars who want to use QCA simply because it is automated will be quite frustrated by the work it takes on the front end. That said, once all the pre-processing of the data was complete and the dictionary validated, we were ready to analyze this data and the data will clearly be the basis for multiple articles. What we discovered really surprised us—there was an overwhelming dominance of economic norms that contradicted the codified regime for the ocean, The United Nations Convention on the Law of the Sea, which held high hopes for international cooperation and substantial conservation. We argue that we discovered a heretofore invisible regime governing the World Ocean and we have the generative grammar to show for it.

This finding clearly refutes the FAO’s claim that:

…paradigm shifts … have occurred in the fisheries and aquaculture world in the last half century. Gone are the days of the productivity paradigm of continuously higher catches; capture fisheries governance is now focused on ensuring sustainable catch levels with maximum economic value of these catches, with aquaculture growth bridging the gap between supply and demand.[viii]

More importantly, we believe this World Ocean Regime (WOR) is not guided by the FAO itself, clearly staffed with people working hard on marine sustainability, but is rather an artifact of external, hegemonic, discipline from the larger global neo-liberalism guiding most political-economic conditions internationally.[ix] Thus, the economism we measured is probably guiding other human interactions with Earth systems, and is probably a key source of global violations of the principles of sustainability.[x] If this claim is true, and current global processes like human-marine interactions are not sustainable, it literally means those interactions cannot continue beyond some breaking point—I’ve proposed one such breaking point in the global interconnection of large marine systems and global fisheries.[xi] Thus using quantitative content analysis and the Provalis software tools, we were able to empirically demonstrate key drivers and threats to the human prospect—we think that, even if we are 99% wrong, that deserves some serious consideration because the stakes could not possibly be higher.

Anyone who would like a copy of the paper discussed can contact me at .

[i] Schofer, E., & Hironaka, A. (2005). The Effects of World Society on Environmental Protection Outcomes. Social Forces, 84(1), 25-46.
[ii] Dauvergne, P. (2008 The Shadows of Consumption: Consequences for the Global Environment. Cambridge, MA: MIT Press.
[iii] Ruggie, J. G. (1983). International regimes, transactions, and change: Embedded liberalism in the postwar economic order. In S. Krasner (Ed.), International regimes. Ithica: Cornell University Press.
[iv] Grimmer, J., & Stewart, B. M. (2013). Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3), 267-297. doi:10.1093/pan/mps028
[v] Xu, Z., & Bengston, D. N. (1997). Trends in National Forest Values among Forestry Professionals, Environmentalists, and the News Media, 1982–1993. Society & Natural Resources, 10(1), 43-59. doi:10.1080/08941929709381008
[vi] See Landis, J. R., & Koch, G. G. (1977). The Measurement of Observer Agreement for Categorical Data. Biometrics, 159-174.
[vii] Laver, M., & Garry, J. (2000). Estimating Policy Positions from Political Texts. American Journal of Political Science, 619-634.
[viii] Fao. (2016). Fao’s Response to the Nature Communications Article “Catch Reconstructions Reveal That Global Marine Fisheries Catches Are Higher Than Reported and Declining”. Retrieved from Rome:, p. 1.
[ix] See for example, Centeno, M. A., & Cohen, J. N. (2012). The Arc of Neoliberalism. Annual Review of Sociology, 38(1), 317-340. doi:doi:10.1146/annurev-soc-081309-150235
[x] Jacques, P. (2015 Sustainability: The Basics. New York: Routledge.
[xi] Jacques, P. J. (2015). Are World Fisheries a Global Panarchy? Marine Policy, 53(0), 165-170. doi:


See it live

Interested in QDA Miner and WordStat? Register for one of our webinars!

Webinar Registration