What are Scientometrics and Bibliometrics?
Scientometrics and bibliometrics are methodological approaches in which the scientific literature itself becomes the subject of analysis. In a sense, they could be considered a science of science. Scientometrics researchers often attempt to measure the evolution of a scientific domain, the impact of scholarly publications, the patterns of authorship, and the process of scientific knowledge production. Scientometrics and bibliometrics often involve the monitoring of research, the assessment of the scientific contribution of authors, journals, or specific works, as well as the analysis of the dissemination process of scientific knowledge. Researchers in such approaches have developed methodological principles on ways to gather information produced by the activity of researchers’ communications, and have used specific methods such as citation analysis, social network analysis, co-word, and content analysis, as well as text-mining to achieve these goals. Many bibliometrics studies will focus on authorship or will measure the contribution of journal and research organizations, but it may also involve a content analysis of words in titles, abstracts, the full text of books, journal articles or conference proceedings, or keywords assigned to published articles by editors or librarians.
Example of studies using WordStat and QDA Miner
WordStat and QDA Miner have often been used for these kinds of research efforts (see the reference section below). For example, to celebrate the 25th anniversary of Human Communication Research, Stephen (1999) analyzed words in the titles of all 634 articles published during those years and published his results as the last article in the final issue of the 25th volume. Using co-word analysis and hierarchical clustering, he was able to identify relationships among concepts as well as changes in topics studied over time. West (2007) did a similar analysis of words in the titles of 345 papers published in the International Journal of Advertising.
Abstracts are also good indicators to understand and grasp the content of publications. They often provide a more detailed description of the scope of a paper, of its theoretical underpinning, its methodological approach, and of other contextual variables. Lonchamp (2012) used WordStat to analyze a corpus of 121 abstracts from the International Journal of Computer-Supported Collaborative Learning (IJCSCL) to investigate and map its content. Other researchers have analyzed much larger corpora consisting of tens of thousands of journal abstracts on gene expression in order to identify ambiguous designation of genes in the published articles (Coimbra, Vanderwall, and Oliveira (2010) or to reveal unexpected gene associations (Chaussabel & Sheer, 2002).
Keywords may also be good indicators of a paper’s content, and have often been used in scientometrics studies (Fratesi, 2008; Lonchamp, 2012; Reinhold, Laesser, & Bazzi, 2014). For example, Fratesi (2008) used keywords and titles of about 175,000 articles published in 68 geology journals between 1945 and 2000 to track changes in the influence of subdisciplines over time.
Several researchers have also used the text-processing capabilities of WordStat to extract and analyze large corpora consisting of the full text of journal articles (Lonchamp, 2012; Anderson et al, 2007; Waismel-Manor, 2011). For example, Waismel-Manor (2011) retrieved 1,317 articles from the Campaigns & Elections journal using Lexis-Nexis, obtaining a text corpus of 1,736,042 words, and was able to demonstrate the gradual professionalization of campaign consultants over time. Anderson, Joly, and Fairhurst (2007) analyzed the full text of 149 articles published over a five-year period in four retail-trade publications to document how retailers used business intelligence and data-mining tools to implement customer relationship management (CRM) in retailing.
Bibliometrics studies have often focused on co-citation analysis, identifying the influence of authors and journals and the relationship among them. One good example is the study of Jacobsen, Punzalan, and Hedstrom (2013). They analyzed 165 articles on collective memory from four leading archival studies journals between 1980 and 2010 to identify which scholars and well-known works have been the most influential. Oleinik (2011) also used WordStat to study authorship and citation patterns across North American and Russian institutions.
Scientometrics studies may also focus on how concepts are being defined over time or in different domains. For example, Walterbush, Gräuler, and Teuteberg (2014) identify similarities and differences in definitions of the word “trust” through literature of research spanning over 50 years. The authors collected a set of 121 definitions from various domains and analyzed those with a word-stem frequency analysis in WordStat as well as with a qualitative data analysis using QDA Miner.
Software features for scientometrics and bibliometrics studies
Several features in QDA Miner and WordStat have proven to be useful for scientometrics and bibliometrics studies. This section presents a quick overview of some of those features.
- Data importation: The ability to import MS Word, RTF, HTML, as well as PDF files, and to associate metadata (such as dates, numerical and categorical data) with those articles, allowing one to easily create a corpus of full-text articles with relevant variables. The QDA Miner ability to import the Reference Information System (RIS) data files is also convenient for importing information from journal databases like ProQuest or bibliographic software such as EndNote or Reference Manager. The Document Conversion Wizard is especially useful to import data from other databases such as Lexis/Nexis, for splitting single documents into multiple ones, or for extracting variables from structured listings or reports.
- Editing, Tagging, and Annotating: Once imported into QDA Miner, documents may be edited, coded and annotated manually, allowing one to perform content analysis with WordStat while ignoring irrelevant sections or focusing on specific ones. For example, manually tagging the reference sections of journal articles could allow one to perform co-citation analysis, while by manually coding the research method sections of journal articles, one could also describe the evolution of research methods over time or compare methods used in different journals.
- Text Pre-processing: WordStat’s ability to transform words into stems, to lemmatize and remove words of little semantic value (like prepositions, conjunctions, or pronouns) allows one to quickly focus on more relevant words and phrases.
- Words and Phrases Extraction: WordStat can process up to 300,000 words per second and quickly produce frequency counts of significant words, extract common phrases, and produce visual displays in the form of bar charts, word clouds, etc.
- Analysis of Co-Occurrence: The analysis of co-occurrences using statistical techniques such as hierarchical clustering, multidimensional scaling as well as visualization tools like the proximity plot, allows one to promptly identify topics and themes in a discipline. Such tools in WordStat have often been used for mapping scientific domains (Fratesi & Vacher, 2008; Friedman & Smiraglia, 2012; Lonchamp, 2012; Reinold, Lasser & Bazzi (2014).
- Comparative Analysis: The ability to compare frequencies of words, phrases or content categories across different sources (e.g., journals, countries) or to look for changes over time could be used to identify the evolution of a scientific discipline, the rise and fall of specific ideas or concepts, or to document the differentiation process of scientific publications or the geo-spatial distribution of scientific activities. One could compute simple statistical tests (like chi-square tests, F-tests, or correlations), create presentation-quality visualizations (such as bar charts, line charts, bubble charts, or heatmaps), and apply correspondence analysis.
- Application of content analysis dictionaries: The possibility in WordStat to build dictionaries of keywords, key phrases, and proximity rules allows one to focus on specific dimensions. For example, one may easily build a dictionary of authors or journals and perform co-citation pattern analysis. One may also create dictionaries to group together key terms into broader concepts, allow one to measure the prevalence of methodological traditions, theories, research topics, etc.
- Keyword-In-Context: The Keyword-in-Context (or KWIC) feature is an essential tool to test the validity of existing or user-built dictionaries by making sure words or phrases used to measure reference to specific topics are effectively capturing the intended meaning. When an item is found to be ambiguous, KWIC lists are also useful to identify proper disambiguation rules.
References of scientometrics studies using QDA Miner & WordStat
Anderson, J., Jolly, L.D., & Fairhurst, (2007). Customer relationship management in retailing: A content analysis of retail trade journals. Journal of Retailing and Consumer Services, 14(6), 394-399.
Coimbra, R.S., Vanderwall, D.E., & Oliveira, G.C. (2010). Disclosing ambiguous gene aliases by automatic literature profiling. BMC Genomics 2010, 11(Suppl 5):S3.
Fratesi, S.E. & Vacher, H.L. (2008). Scientific journals as fossil traces of sweeping change in the structure and practice of modern geology, Journal of Research Practice, 4(1), 1-23.
Friedman, A., & Smiraglia, R.P. (2012). Nodes and arcs: concept map, semiotics, and knowledge organization. Journal of Documentation, 69(1), 27-48.
Jacobsen, T., Punzalan, R.L, & Hedstrom, M.L (2013). Invoking ‘‘collective memory’’: mapping the emergence of a concept in archival science. Archival Science, 13(2): 217-251.
Lonchamp, J. (2012). Computational analysis and mapping of ijCSCL content. Computer-Supported Collaborative Learning, 7 (4), 475-497.
Milojevic, S. (2012). Multidisciplinary cognitive content of nanoscience and nanotechnology. Journal of Nanoparticle Research, 14(1), 1-28.
Milojevic, S. & Leydesdorff, L. (2012). Information metrics (iMetrics): a research specialty with a socio-cognitive identity? Scientometrics, 95, 141–157.
Oleinik, A. (2012). Publication patterns in Russia and the West compared. Scientometrics, 93(2), 533-551.
Reinhold, S., Laesser, C., & Bazzi, D. (2014). The intellectual structure of transportation management research: A review of the literature. 14th Swiss Transport Research Conference. Monte Verità / Ascona.
Smiraglia, R.P. (2006). Two kinds of powers: insight into the legacy of Patrick Wilson. College of Information and Computer Sciences, Long Island University: New York.
Stephen, T. (1999). Computer-assisted concept analysis of HCR’s first 25 years. Human Communication Research, 25(4), 498-513.
Stephen, T. (2000). Concept analysis of gender, feminist, and women’s studies research in the communication literature. Communication Monographs, 67, 193-214.
Stephen, T. (2001). Differentiating the U.S. regional communication journals: A computer-assisted concept analysis. Paper presented at the meeting of the International Communication Association. Washington DC.
Vodicka, M., Schneider, O., & Bunse, K. (May 2009). Energy efficiency as driver for competitiveness in future manufacturing – a consolidated literature review and options for future research. POMS 20th Annual Conference, Orlando, Florida U.S.A.
Walterbusch, M., Gräuler, M., & Teuteberg, F. (2014). How Trust is Defined: A Qualitative and Quantitative Analysis of Scientific Literature. 20th Americas Conference on Information Systems, Savannah, Georgia, USA.
Waismel-Manor, I. (2011). Spinning forward: Professionalization among campaign consultants. Journal of Political Marketing, 10(4), 350-371.
West, D. (2007). Directions in marketing communication research: An analysis of the international journal of advertising. International Journal of Advertising, 26(4), 543-554.
Zhang, H., Babar, M.A., & Tell, P. (2011). Identifying relevant studies in software engineering. Information and Software Technology, 53(6), 625-637.
Additional references of scientometrics studies using QDA Miner & WordStat
Ammarukleart, S., & Kim, J. (2017). Institutional repository research 2005-2015: a trend analysis using bibliometrics and text mining. Digital Library Perspectives.
Araujo, R. F., & Oliveira, M. (2017). Technological Basis for Information Science in Brazil: A Scientometric Study. Qualitative and Quantitative Methods in Libraries, 231-241.
Bamidis, P. D. (2017). Internet of things in health trends through bibliometrics and text mining. Informatics for Health: Connected Citizen-Led Wellness and Population Health, 235, 73.
Bertoncel, T., & Meško, M. (2019). Early Warning Systems in Industry 4.0: A Bibliometric and Topic Analysis. International Journal of E-Services and Mobile Applications, 11(2), 56-70.
Birch, T., & Reyes, E. (2018). Forty years of coastal zone management (1975–2014): Evolving theory, policy and practice as reflected in scientific research publications. Ocean & Coastal Management, 153, 1-11.
Calma, A., Martí-Parreño, J., & Davies, M. (2019). Journal of the Academy of Marketing Science 1973–2018: an analytical retrospective. Scientometrics, 119(2), 879-908.
Dabic, M., González-Loureiro, M., & Furrer, O. (2014). Research on the strategy of multinational enterprises: key approaches and new avenues. Business Research Quarterly, 17(2), 129-148.
Forrester, A. (2015). Barriers to open access publishing: Views from the library literature. Publications, 3(3), 190-210.
Galata Bickell, E. (2019). The framing effect of the media in the regulation of GMOs: a case study of Russia. Russian Journal of Communication, 11(3), 240-252.
Gomes, J., & Dewes, H. (2017). Disciplinary dimensions and social relevance in the scientific communications on biofuels. Scientometrics, 110(3), 1173-1189.
Gupte, Nilish, “Augmented Reality and Health Informatics: A Study Based on Bibliometric and Content Analysis of Scholarly Communication and Social Media” (2019). Selected Full Text Dissertations, 2011-. 12.
Jacobsen, T., Punzalan, R. L., & Hedstrom, M. L. (2013). Invoking “collective memory”: mapping the emergence of a concept in archival science. Archival Science, 13(2-3), 217-251.
Jerman, A. & Pejic Bach, M. & Bertoncelj, A. (2018). A Bibliometric and Topic Analysis on Future Competences at Smart Factories. Machines. 6. 41. 10.3390/machines6030041.
Koç, T., Kurt, K., & Akbıyık, A. (2019). A Brief Summary of Knowledge Management Domain: 10-Year History of the Journal of Knowledge Management. Procedia Computer Science, 158, 891-898.
León-de la O, D. I., Thorsteinsdóttir, H., & Calderón-Salinas, J. V. (2018). The rise of health biotechnology research in Latin America: A scientometric analysis of health biotechnology production and impact in Argentina, Brazil, Chile, Colombia, Cuba and Mexico. PLoS One, 13(2), e0191267.
Mehdizadeh-Maraghi, R., Nazari, M., & Minaii, M. B. (2014). Mapping science of Massage therapy during 2008-2013 in the Scopus database. Journal of Islamic and Iranian Traditional Medicine, 4(4), 333-342.
Milojević, S. (2015). Quantifying the cognitive extent of science. Journal of Informetrics, 9(4), 962-973.
Milojević, S., & Leydesdorff, L. (2013). Information metrics (iMetrics): a research specialty with a socio-cognitive identity?. Scientometrics, 95(1), 141-157.
Milojević, S., Sugimoto, C. R., Larivière, V., Thelwall, M., & Ding, Y. (2014). The role of handbooks in knowledge creation and diffusion: A case of science and technology studies. Journal of Informetrics, 8(3), 693-709.
Mora, L., Deakin, M., & Reid, A. (2017, March). Smart-city development paths: insights from the first two decades of research. In International conference on smart and sustainable planning for cities and regions (pp. 403-427). Springer, Cham.
Mora, L., Deakin, M., & Reid, A. (2019). Combining co-citation clustering and text-based analysis to reveal the main development paths of smart cities. Technological Forecasting and Social Change, 142, 56-69.
Münster, S. (2017). Employing bibliometric methods to identify topics and facilitators of digital 3D modeling in the humanities. iConference 2017 Proceedings.
Nevzorova, E. N., Bobek, S., Kireenko, A. P., & Sklyarov, R. A. (2016). Tax evasion: the discourse among government, business and science community based on bibliometric analysis. Journal of Tax Reform, 2(3), 227-244.
Nevzorova, E. N., Kireenko, A. P., & Sklyarov, R. A. (2017). Bibliometric analysis of the literature on tax evasion in Russia and foreign countries. Journal of Tax Reform, 3(2), 115-130.
Niknia, M., & Mirtaheri, S. L. (2015). Mapping a decade of linked data progress through co-word analysis. Webology, 12(2).
Oleinik, A. (2014). Conflict (s) of interest in peer review: Its origins and possible solutions. Science and engineering ethics, 20(1), 55-75.
Oleinik, A. (2015). Between the west and the east: Ukrainian economic thought as the crossroads. Вісник Киiвського нацiонального унiверситету iм. Тараса Шевченка. Серiя: Економiка, (9 (174)).
Papagiannidis, S., & Marikyan, D. (2020). Smart offices: A productivity and well-being perspective. International Journal of Information Management, 51, 102027.
Pivoto, D., Waquil, P. D., Talamini, E., Finocchio, C. P. S., Dalla Corte, V. F., & de Vargas Mores, G. (2018). Scientific development of smart farming technologies and their application in Brazil. Information processing in agriculture, 5(1), 21-32.
Raeeszadeh, M., & Karamali, M. (2018). Scientific mapping of military trauma papers using co-word analysis in Medline. Journal Military Medicine, 20(5), 476-487.
Rezaeian, M., Montazeri, H., & Loonen, R. C. G. M. (2017) A case study on natural ventilation. Technological Forecasting and Social Change, 118, 270-280.
Rudd, M. A. (2017). What a Decade (2006–15) Of Journal Abstracts Can Tell Us about Trends in Ocean and Coastal Sustainability Challenges and Solutions. Frontiers in Marine Science, 4, 170.
Röhm, P. (2018). Exploring the landscape of corporate venture capital: a systematic review of the entrepreneurial and finance literature. Management Review Quarterly, 68(3), 279-319.
Sanina, A., Balashov, A., & Kaysarova, V. (2017). Public Administration Research in Contemporary Russia: An Analysis of Journal Publications, 2010–2014. International Journal of Public Administration, 40(12), 1036-1049.
Sibarani, E. M., Scerri, S., Morales, C., Auer, S., & Collarana, D. (2017, September). Ontology-guided job market demand analysis: a cross-sectional study for the data science field. In Proceedings of the 13th International Conference on Semantic Systems (pp. 25-32).
Smiraglia, R. (2015). Domain analysis for knowledge organization: tools for ontology extraction. Chandos Publishing.
Smiraglia, R. P. (2017) Facets as Discourse in Knowledge Organization: A Case Study in LISTA. North American Symposium on Knowledge Organization, 6(1), 124-138.
Smiraglia, R. P. (2017). ISKO 14’s Bookshelf: Discourse and Nomenclature—An Editorial. KO KNOWLEDGE ORGANIZATION, 44(1), 3-12.
Smiraglia, R. P., & Cai, X. (2017). Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization. Knowledge Organization, 44(3), 215-233.
Stevens, T. M., Aarts, N., & Dewulf, A. R. P. J. (2019). The emergence and evolution of master terms in the public debate about livestock farming: Semantic fields, communication strategies and policy practices. Discourse, Context & Media, 31, 100317.
Tajedini, O., & Baniasadi, M. (2018). Biological Sequence in Iranian Articles on Information Science Indexed in Web of Science. Journal of Scientometrics, 4(1), 59-76.
Tomaževič, N. (2019). Social Responsibility and Consensus Orientation in Public Governance: a Content Analysis. Central European Public Administration Review, 17(2), 189-204.
Uğur, N. G., & Akbıyık, A. (2018). Emerging Trends in IS Research: A Co-word Analysis (2007–16). Canadian Journal of Information and Library Science, 42(3), 228-248.
Wachsmann, M. S., Onwuegbuzie, A. J., Hoisington, S., Gonzales, V., Wilcox, R., Valle, R., & Aleisa, M. (2019). Collaboration patterns as a function of research experience among mixed researchers: A mixed methods bibliometric study. The Qualitative Report, 24(12), 2954-2979.
Wang, P., You, S., Manasa, R., & Wolfram, D. (2017). Open peer review in scientific publishing: A Web mining study of PeerJ authors and reviewers. Journal of data and information science, 1(4), 60-80.
Zhan, M., & Widén, G. (2019). Understanding big data in librarianship. Journal of Librarianship and Information Science, 51(2), 561-576.