I get a lot of questions about one methodological approach in particular—computer aided text analysis. Here, I have included some of the resources that have helped me to learn how analyze text based data.
Getting Started with Text Analysis
Computer aided text analysis is a type of content analysis. Duriau and colleagues (2007) discuss the basics of content analysis as a methodological approach. It's an excellent place to start! Short, McKenny, and Reid (2018) provides an overview of some of the most common computer-aided text analysis techniques. A more recent special issue call for papers (Kiley et al., 2023) gives us a glimpse of where methods in this area are headed. It touches on some key trends, like the increased use of embedding-based models, and discusses the promise of text analysis methods for interpretive approaches to analysis.
Duriau, V.J., Reger, R.K., & Pfarrer, M.D. (2007). A Content Analysis of the Content Analysis Literature in Organization Studies: Research Themes, Data Sources, and Methodological Refinements. Organizational Research Methods, 10(1), 5–34. https://doi.org/10.1177/1094428106289252
Kiley, J., McKenny, A., Short, J., & Smith, A. (2023). Call for Papers for a Feature Topic: Having A Way with Words: Innovations and Improvements in Text Analysis Methods. Organizational Research Methods, 26(4), 752-755. https://doi.org/10.1177/10944281231195704
Short, J.C., McKenny, A.F., & Reid, S.W. (2018). More Than Words? Computer-Aided Text Analysis in Organizational Behavior and Psychology Research. Annual Review of Organizational Psychology and Organizational Behavior, 5(1), 415–435. https://10.1146/annurev-orgpsych-032117-104622
Text Analysis Methods
Below I discuss a few text analysis methods that have become popular in organizational research. This is not a comprehensive review, just a starting point.
Dictionary Based Methods
With this approach the presence of a construct in a given text is measured based on how often a set of words appears within the text. Tools designed for this type of analysis include CATScanner, Linguistic Inquiry and Word Count (LIWC), and DICTION. To learn more about this method, check out:
Reid, S.W., McKenny, A.F., & Short, J.C. (2023). Synthesizing Best Practices for Conducting Dictionary-Based Computerized Text Analysis Research. Methods To Improve Our Field, 14, 43–78. http://doi.org/10.1108/S1479-838720220000014004
Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, 25(1), 114-146. https://doi.org/10.1177/109442812097168
McKenny, A.F., Aguinis, H., Short, J.C., & Anglin, A.H. (2018). What Doesn’t Get Measured Does Exist: Improving the Accuracy of Computer-Aided Text Analysis. Journal of Management, 44(7), 2909–2933. https://10.1177/0149206316657594
Short, J. C., Broberg, J. C., Cogliser, C. C., & Brigham, K. H. (2009). Construct Validation Using Computer-Aided Text Analysis (CATA). Organizational Research Methods, 13(2), 320-347. https://doi.org/10.1177/1094428109335949
Sentiment Analysis
Sentiment analysis measures the emotional sentiment expressed in texts. Up until recently, sentiment analysis has been primarily a dictionary based method. In addition to the sentiment analysis dictionaries that come with CATA software discussed above, there are also dictionaries that can be used in R and Python. Methods using word embedding-based sentiment analysis models like FinBERT and RoBERTa are also becoming more common.
Bettinazzi, E. L., Jacqueminet, A., Neumann, K., & Snoeren, P. (2023). Media Coverage of Firms in the Presence of Multiple Signals: A Configurational Approach. Academy of Management Journal. https://doi.org/10.5465/amj.2020.1791
Choudhury, P., Wang, D., Carlson, N. A., & Khanna, T. (2019). Machine learning approaches to facial and text analysis: Discovering CEO oral communication styles. Strategic Management Journal, 40(11), 1705-1732. https://doi.org/10.1002/smj.3067
Word Embedding Models
This machine learning technique models words as vectors in multidimensional space based on how they are used in massive text training datasets. Trained models can be used to quantitatively measure the difference between the meaning of two words (or sentences).
Aceves, P., & Evans, J. A. (2023). Mobilizing Conceptual Spaces: How Word Embedding Models Can Inform Measurement and Theory within Organization Science. Organization Science. https://doi.org/10.1287/orsc.2023.1686
Fyffe, S., Lee, P., & Kaplan, S. (2024). “Transforming” Personality Scale Development: Illustrating the Potential of State-of-the-Art Natural Language Processing. Organizational Research Methods, 27(2), 265–300. https://doi.org/10.1177/10944281231155771
Harrison, J. S., Josefy, M. A., Kalm, M., & Krause, R. (2023). Using Supervised Machine Learning to Scale Human‐Coded Data: A method and dataset in the board leadership context. Strategic Management Journal, 44(7), 1780-1802. https://doi.org/10.1002/smj.3480
Miric, M., Jia, N., & Huang, K. G. (2023). Using Supervised Machine Learning for Large-Scale Classification in Management Research: The Case for Identifying Artificial Intelligence Patents. Strategic Management Journal, 44(2), 491-519. https://doi.org/10.1002/smj.3441
Poschmann, P., Goldenstein, J., Büchel, S., & Hahn, U. (2023). A Vector Space Approach for Measuring Relationality and Multidimensionality of Meaning in Large Text Collections. Organizational Research Methods. https://doi.org/10.1177/10944281231213068
Topic Modeling
Topic modeling identifies groups of words that appear together often within a sample of documents. These clusters of words can be used to infer the topics of interest within the sample. I have heard that Leximancer and WordStat can be used for topic modeling. You can also use the BERTopic or tomotopy packages for Python.
Hannigan, T.R., Haans, R.F.J., Vakili, K., Tchalian, H., Glaser, V.L., Wang, M.S., Kaplan, S., & Jennings, P.D. (2019). Topic Modeling in Management Research: Rendering New Theory from Textual Data. Academy of Management Annals, 13(2), 586–632. https://doi.org/10.5465/annals.2017.0099
Jung, J., Zhou, W., & Smith, A.D. (2024). From Textual Data to Theoretical Insights: Introducing and Applying the Word-Text-Topic Extraction Approach. Organizational Research Methods. https://doi.org/10.1177/10944281241228186
Schmiedel, T., Müller, O., & Vom Brocke, J. (2019). Topic Modeling as a Strategy of Inquiry in Organizational Research: A Tutorial With an Application Example on Organizational Culture. Organizational Research Methods, 22(4), 941–968. https://doi.org/10.1177/1094428118773858
Valtonen, L., Mäkinen, S.J., & Kirjavainen, J. (2024). Advancing Reproducibility and Accountability of Unsupervised Machine Learning in Text Mining: Importance of Transparency in Reporting Preprocessing and Algorithm Selection. Organizational Research Methods, 27(1), 88–113. https://doi.org/10.1177/10944281221124947
Using Python for Text Analysis
Python seems to the coding language with the most and newest packages for text analysis. For example, HuggingFace hosts hundreds of open source models that can be accessed using Python and used for things like named entity recognition, image identification, and speech recognition. Python opens the door to a lot of possibilities in terms of analytical techniques. Should you decide to take this road you won't regret it!
I have met a few people who have expressed apprehension about learning Python. If you can figure out R, you can figure out Python! Trust me, you've got this! I got started with Python with the help of a methods workshop provided by Indiana University, Introduction to Python for Social Scientists. The workshop is free, online, and addresses the quirks of Python that can sometimes catch first-timers off guard. I also highly recommend using Google Colaboratory for writing and running Python code. It requires no installation, provides free access to GPU processing, and (as of this moment) includes free AI coding assistance.