Monitoring Implementation of CRPD and SDG Through Big Data Analytics and Text Mining



Monitoring Implementation of CRPD and SDG
Through Big Data Analytics and Text Mining

Derrick L. Cogburn
American University


As we look at the major global frameworks to support disability-inclusive international development, especially the 2030 Sustainable Development Goals, the United Nations Convention on the Rights of Persons with Disabilities (CRPD), the Habitat III New Urban Agenda (NUA), the 3rd World Conference on Disaster Risk Reduction Sendai Framework, and the World Summit on the Information Society (WSIS) WSIS Action Lines, one common element is the need to focus on monitoring and evaluation of their implementation.

For example, Paragraph 39 of the SDGs opens the section on implementation and its importance by stating:

The scale and ambition of the new Agenda requires a revitalized Global Partnership to ensure its implementation. We fully commit to this. This Partnership will work in a spirit of global solidarity, in particular solidarity with the poorest and with people in vulnerable situations. It will facilitate an intensive global engagement in support of implementation of all the Goals and targets, bringing together Governments, the private sector, civil society, the Unites Nations system and other actors and mobilizing all available resources. (Paragraph, 39 SDGs).

One aspect of SDG implementation of particular interest to us are the eleven specific references to persons with disabilities, as well as the broader focus on vulnerable populations, and universal “for all” approaches that attempt include everyone.

Of course, as both a human rights instrument and a development instrument, the entire CRPD is focused on identifying, protecting, and enhancing the social and economic rights of persons with disabilities. Specifically, Article 33 of the CRPD prescribes the process through which States Parties to the Convention should put in place national implementation and monitoring. The three paragraphs of Article 33 read:

1. States Parties, in accordance with their system of organization, shall designate one or more focal points within government for matters relating to the implementation of the present Convention, and shall give due consideration to the establishment or designation of a coordination mechanism within government to facilitate related action in different sectors and at different levels.
2. States Parties shall, in accordance with their legal and administrative systems, maintain, strengthen, designate or establish within the State Party, a framework, including one or more independent mechanisms, as appropriate, to promote, protect and monitor implementation of the present Convention. When designating or establishing such a mechanism, States Parties shall take into account the principles relating to the status and functioning of national institutions for protection and promotion of human rights.
3. Civil society, in particular persons with disabilities and their representative organizations, shall be involved and participate fully in the monitoring process. (CRPD, Article 33)

In 2017, our book, Making Disability Rights Real in Southeast Asia: Implementing the UN Convention on the Rights of Persons with Disabilities in ASEAN focused on exploring Article 33 implementation of the CRPD within the ten countries of Southeast Asia (Cogburn and Reuter, 2017). This project utilized researchers on the ground in eight of the ten countries included in the analysis. While this type of on the ground research team was helpful, it was also expensive and time consuming.
The primary purpose of this project is to focus on two of these global frameworks, the SDGs and CRPD, and assess the degree to which the disability-specific components of the SDGs and the Article 33 elements of the CRPD have been implemented around the world. A secondary goal of this project is to explore and demonstrated the potential of big data analytics and text mining techniques as a complement to our global objective to monitor and evaluate implementation of the SDGs and CRPD.

Inductive and Deductive Conceptual Approaches to Big Data Text

While there are many techniques available to exploit the power and potential of big data and analytics in specific research projects, two broad conceptual approaches – Inductive and Deductive – help to begin to illustrate the potential.

Inductive techniques allow us to ask broad exploratory questions about a large-scale text-based dataset, without specific a priori goals. For example, we can ask what key words and phrases characterize a dataset, and determine what topics, themes, and trends exist. We can identify named entities within the dataset, including countries, people, organizations, and acronyms. For each of these elements, we can use cross-tabulation techniques to determine how these findings may change in relation to other key variables, such as date, region, organizational type, etc.

In contrast, deductive techniques are confirmatory, and allow us to test hypotheses and ask specific research questions of the data. We can build, adopt, or adapt dictionaries or categorization models to help us explore specific topics in the dataset, to determine the degree of their presence or absence in the dataset. Specific variants of these models allow us to conduct sentiment analysis, to characterize positive and negative sentiment within the dataset. Further, we can use supervised machine learning to develop classification models that allow us to predict text with a high degree of accuracy.

Methodology: Big Data Analytics and Text Mining for CRPD and SDG Implementation

SDG Implementation

The inductive and deductive techniques described here can make an important contribution to that monitoring and evaluation process. For example, the SDGs contain eleven references to persons with disabilities, which stands in stark contrast to the Millennium Development Goals (MDGs), which made no mention of persons with disabilities. In one study, we took a deductive approach to assess the degree to which these diverse disability issues have been included in the agendas of the world’s international development organizations. To do so, we developed three categorization models, which were designed to assess three variations to disability inclusion in the SDGs. We call these models CAT1, CAT2, and CAT3. CAT1 focuses on the specific references to disability in the SDGs (11 references). CAT2 focuses on the broader focus on “vulnerable” populations (18 references) and is based on paragraph 23 of the SDGs, which states: “People who are vulnerable must be empowered. Those whose needs are reflected in the Agenda include all children, youth, persons with disabilities (of whom more than 80 per cent live in poverty)”. Finally, CAT3 takes the broadest approach and focuses on universality and inclusion “for all,” and is based on the principle of “leave no one behind” (17 references). This three-part approach is based on a report exploring inclusion of persons with disabilities in the 2030 development agenda from the International Disability Alliance (IDA) and the International Disability and Development Commission (IDDC). Figure 1 illustrates the structure of SDG CAT1. A similar structure exists for the other categorization models and the CRPD implementation.

Figure 1. Illustration of the Development of Categorization Model 1

Content Analysis Dictionary

Next, we selected a sample of 31 leading international development organizations and collected their annual reports from 2000-2016 to cover the period of the MDGs and the transition into the SDGs. We created a corpus of these documents using the Provalis ProSuite, a commercial mixed-methods research tool licensed from Provalis Research ( The final corpus included 351 documents, 15,534,183 words (196,206 unique).

CRPD Implementation

For the SDG implementation study, we took a similar deductive approach. However, in this instance we built a complex categorization model to identify key components of the three paragraphs of CRPD Article 33, and each paragraph, summarized as: CRPD1: Focal Point; CRPD2: Independent Mechanism; CRPD3: DPO Involvement. For each of these three major categories, we developed several sub-categories to further delineate aspects of CRPD implementation. Finally, specific words, phrases, and rules were included in the dictionary, which are intended to semantically represent those diverse concepts of CRPD implementation. For each document, we added several demographic variables, including: Year, SDG Region and Sub-Region, ReportType. These variables enable comparisons across the data.

There are now 172 countries that have ratified the CRPD. Each of these States Parties is required to submit to the UN Committee on the Rights of Persons with Disabilities regular reports on the progress made with implementing the Convention in their country. The initial report should be made within two years of ratifying/acceding to the Convention, and thereafter every four years. These States Parties Reports are expected to be comprehensive, covering all the areas of the Convention. Not all State Parties have submitted their report, while some countries have already submitted their second report. Nonetheless, as data for this study, we collected every available State Report (n=131). In addition to the official State Report, non-state actors may submit a report evaluating the progress made on implementing the CRPD in a specific country. The UN Office of the High Commissioner for Human Rights (UNOHCHR) makes available on its website ( both the State Reports and Shadow Reports. Whereas the State Report is expected to be comprehensive, the Alternative or “Shadow” Report may be focused on any specific issue of interest to the author(s). To supplement the State Reports as data for this study, we also collected every available CRPD Alternative Report (n=63), for a total corpus of 194 documents, comprised of 3,937,875 words (53,565 unique).

Findings: Evaluating Implementation of Disability-Inclusive SDGs and the CRPD

Findings from SDG Implementation

In both of these studies, we have some interesting findings. In our deductive study of SDGs and disability inclusion, we found that overall within the dataset, the key focus has been on poverty related elements, with very little attention so far on disability inclusion in the broader SDG agenda. Figure 2 below highlights this focus. Given the focus of the MDGs on poverty reduction, a concern that continued into the SDGs, it makes sense that these elements would be dominant in our dataset.

Figure 2. Focus of Disability

Distribution of keywords

However, we can see from Figure 3. below, how these issues have converged over the period of the MDGs, leading us to predict a more balanced approach going forward.

Figure 3. Categories by Annual Report Year

frequency by report year

One final observation from this first study is found in Figure 4 below, where we see the increasing focus on sustainable cities and communities, which also suggest a potential correlation with the introduction of the United Nations Convention on the Rights of Persons with Disabilities (CRPD) in 2006.

Figure 4. SDG 11 Focus on Sustainable Cities and Communities

Frequency by report year

Findings from CRPD Implementation

When we look at the CRPD implementation, we first look across the entire dataset to determine which of the three major categories the States Parties have focused on thus far in their reported implementation. As illustrated in Figure 5, across the entire dataset, the dominant focus, by far, has been on Article 33, paragraph 1., establishing the focal point and coordination mechanism. This makes sense, given the paramount importance of initially establishing the mechanism within government that works across multiple ministries to oversee and implement the CRPD.

Figure 5. States Parties Focus in CRPD Implementation

Distribution of keywords

When we compare the focus based on report type as shown in Figure 6. below, we see the same pattern. Both State Reports and Alternative Reports place greater attention on CRPD Article 33, paragraph 1, with the Alternative Reports placing even more emphasis than the State Reports on this area (the Y axis of rate per 10,000 is designed to normalize the impact of a differential number of documents for each type; i.e., there are more State Reports than Alternative reports).

Figure 6. Comparing Focus Areas for CRPD Article 33 by Report Type

Frequency by report type

As we drill down into which aspects of each paragraph are garnering the most attention, as illustrated by Figure 7, we see it is the Focal Point that is garnering most of the attention, with substantial attention being paid to establishing the Advisory Board.

Figure 7. Distribute of Focus on Sub-Categories by Report Type

Frequency by report type

Again, these foci are to be expected in the early stages of implementation of the convention. However, it is quite disappointing that so little attention is being paid to establishing the independent mechanism, and even more so by the lack of attention being paid to alignment with the Paris Principles. Nonetheless, as shown in Figure 8. we do see a slight increase in this area starting in 2013.

Figure 8. Focus on Alignment with Paris Principles by Year

Frequency by report year

When we start to look across SDG regions and sub-regions, Figure 9 illustrates how the Oceana, Caribbean, and Central American regions appear to have placed the greatest attention to establishing the focal point.

Figure 9. Sub-Regional Emphasis on Establishing the CRPD Focal Point

Sub-Regional Emphasis



While we see great value in this text mining approach, we want to highlight two limitations of this study. First, this deductive approach to text mining relies heavily on the strength of the categorization model or dictionary developed for the project. For the purposes of analysis, one has to assume the model is sufficiently sufficient to measure the concepts it was designed to measure. While we have spent great care in developing this models used in this study, they pose the greatest limitation to the findings of this study. In future research, we will go through additional steps in dictionary validation, which will be reported on in future studies. Those validated dictionaries will be again applied to this baseline data and will be used going forward to assess longitudinal progress in SDG and CRPD implementation.

A second limitations is in the data. While we have made every attempt to collect every available States Party Report and Alternative Report, there may be some that have eluded our efforts. Those omissions would leave a vacuum in the data for the missing country. Also, a related limitation is this study can only rely on what is bring reported by the States Party, and/or supported or countered by the Alternative Report. This study makes no additional claims as to the level of accuracy of these claims for each country and does not include additional types of data such as interviews or focus groups that would shed additional light on the level of CRPD implementation. The same caveat and limitation hold for the SDG study. While organizational Annual Reports are a good source of data, these reports may not accurately reflect what the organization is actually focused on, or what they have accomplished in each area.


With this brief analysis, we have tried to demonstrate two things. The first is the substantive issues of focus on disability content of the SDGs as represented by the annual reports of leading international development organizations, as well as the areas of emphasis in CRPD implementation as reflected in the States Parties Reports and Alternative Reports. While this is just a preliminary analysis, we hope it is clear the potential this approach has for using text mining approaches to help monitor implementation of the disability-inclusive development aspects of the SDGs and the CRPD. This dataset may now be mined to answer a whole host of questions, using both inductive and deductive approaches. New categorization models may be developed for other areas of interest.