Despite growing research in linguistic analysis, most large-scale research to-date on organizational communications takes the word as the unit of analysis, analyzing the text based on word counts, or correlation in word usage. The key basis of Menai Insight's tools is that there is a layer or meaning in text beyond the words on the page, captured only when considering the relationships conveyed when these words are organized together. That is, textual information express more nuanced relationships, such as justification for particular decisions, expectations of future states, and caveats to an argument. Word-counts and word-clustering approaches, such as topic modeling, abstract away the specific relationships between how the words are structured to form relationships, and how those relationships are structured together into meta-structures.
This article summarizes the limitations of capturing meaning with approaches that take the word as the fundamental unit of analysis; fundamental issues that our approach to capturing meaning seeks to address.
As noted above, a substantial amount of research in the management, strategy and broader organizational fields attempts to capture meaning through word-counts, or correlations measures, such as topic models. Although there may be a layer of meaning that may be conveyed by the vocabulary of the text, for example the theme (Ocasio and Joseph, 2005) or the valence of the of the document (Tausczik and Pennebaker, 2010), the more specific relationships described are hard to capture when only considering words in isolation. The associations in statements such as "We discussed managerial compensation with a compensation consultant" or "The compensation committee used survey data in determining managerial compensation" do not reduce to the individual words. Although approaches such as topic models or word-lists (that take the 'word' as the fundamental unit of analysis) may capture particular dimensions of meaning, the more nuanced relationships conveyed in the text are lost. An analogy could be made to analyzing the content of a book by looking at the keywords in the index - clearly a dimension of the material is captured in the keywords, however specific relationships discussed in the text are lost.
When considering the prevailing text analysis literature, we identified three fundamental limitations associated with traditional text-analysis approaches (that consider words as the fundamental unit of analysis e.g., word-lists or topic modeling).
Below describes each of these limitations in greater detail, providing the basis to illustrate the opportunities for theoretical development from our data-structures that are specifically designed to be sufficiently flexible to capture the within-sentence relationships.
1. An inability to capture conveyed relationships by analyzing individual words
The first limitation of word-based measures derives from the key assumption that meaningful concepts can be captured by analyzing words, or clusters of words, in isolation. Although this assumption may hold for certain concepts (e.g., arguably, the valence of the text Tausczik and Pennebaker, 2010; or high-level logics: Ocasio and Joseph, 2005), as noted above, sentences are used to express relationships between concepts. At the simplest level, these relationships express connections between objects or attributes, for example, how much experience a particular manager has in a particular industry. Textual information can also express more nuanced and complicated relationships, such as justification for particular decisions, expectations of future states, and caveats to an argument. Since all but the simplest of relationships are comprised of multiple words, it is essentially impossible to characterize the underlying message, or meaning of the communications, by analyzing words in isolation. Moreover, while it could be argued that word-count approaches could be extended to capture more complex concepts by searching for phrases (e.g., ‘experience in the manufacturing sector’), it quickly becomes unfeasible to foresee every way in which even the simplest relationships can be conveyed. Thus, while it may be possible to capture high-level themes or changes to those themes over time (Ocasio and Joseph, 2005), characterizing the relationships between concepts, and how the discussed relationships are evolving and diffusing, is essentially impossible through word counts alone. This may be especially true for capturing the relatively subtle, nuanced, and complex concepts in management and strategy theory. Thus, although textual archives may provide the best record of the process by which organizational, institutional, and societal change occurs (e.g., Maguire and Hardy, 2009; Funk and Hirschman, 2014), it is fundamentally difficult to systematically capture this change process for more quantitative analysis, with scaling limitations restricting ability to manually hand code meaning at the overall field level.
2. The structuring of material is lost
The second fundamental issue with current word-based approaches is that they give little consideration to structure; word counts ignore the syntax of how the words are combined to construct meaning (e.g., Matthews and Matthews, 1981; Van Valin and LaPolla, 1997), and how the components of a message are aggregated into an overall meta-structure. Indeed, given the limitations of existing approaches to capture a representation of the relationships discussed in the text, it is difficult to conceive how it could be possible to examine an added layer of complexity, namely how those underlying relationships are structured. Thus, while structure can be conceived of in many different ways (Fairclough, 1992), automated approaches focus on characterizing the language used (such as measures of complexity derived from sentence and word length: Courtis, 1986; Li, 2008; Rennekamp, 2012), rather than how the underlying components of the message are themselves constructed and presented. Specifically, there is very little consideration of the argumentation structure (Harmon, Green, and Goodnight, 2015), or how the order in which information is presented shapes interpretation (Leung, 2014). Research that does consider the impact of the structure of the language tends to be either qualitative in nature (e.g., Martin et al., 1983; Boje, 1991), or derive from hand-coding the structure of the sentences (e.g., Bettman and Weitz, 1983; Harmon, 2018).
While qualitative approaches, including hand coding of sentence structure, may yield substantial insight, their application is constrained to instances where it is feasible to read or hand code each sentence. Although it is possible to dismiss the scalability difficulties of qualitative approaches (e.g., that it is unnecessary to scale, or that the issue can simply be solved by scaling the number of research assistants), scalability difficulties pose a substantial constraint to theoretical development. This may be best illustrated by considering the next level of structure: how individual relationships are combined into overall meta-structures. While there are qualitative studies that examine how components of a message are combined into overall meta-structures (such as work on narratives or storytelling: Martin et al., 1983; Boje, 1991), these are limited to case studies or very small numbers of firms; there has been very little theoretical or empirical consideration of the causes of variations in meta-structures at the field level. Moreover, it is especially difficult to examine interactions between the meta-structures of different firms; development and testing of theory examining how field-level meta-structures form and evolve is beyond the ability of even the largest conceivable number of research assistants to hand code. The lack of an ability to systematically ‘capture’ relationships, the structure of those relationships, and how those relationships are structured into overall meta-structures, thus explains the void of research beyond limited qualitative studies examining how structures and meta-structures form and evolve at the field level.
3. The inability of researchers to specify lists of all relevant words in advance
The final issue with using word-lists to capture constructs of interest is that the approach relies on the ability of researchers to specify lists of all relevant words. While this may be appropriate for capturing concepts that can be represented through a small number of words (or phrases), there are some concepts where it would essentially be impossible to anticipate all words (or phrases) in advance. Although theory may help define concepts of interest, theory typically offers little guidance of the specific words that underlie those concepts. For example, it would be unfeasible to define in advance all of the experiences that a manager may have (e.g., to compare a how a manager’s prior experiences are recharacterized over time); while certain words are common and easy to anticipate (e.g., ‘financial experience’), other phrases are much more obscure and hard to anticipate (e.g., ‘substantial accounting experience, with a focus on mergers and acquisitions’). Similarly, while it would be possible to identify ‘manufacturing experience’ and ‘accounting experiences’ as types of experience, it is hard to foresee all possible aggregations, such as ‘manufacturing, accounting, and human resource experience’.
Our approach seeks to enable analysis of the relationships and meaning discussed in organizational communications, by systematically capturing and representing the nature and form of the meaning conveyed. Our approach goes beyond merely analyzing to words, by taking the syntax of how the words are constructed into sentences into consideration, to capture the discussed relationships, and transform the sentences into consistent representations, that preserve the meaning, while abstracting surface-level variation. As such, our approach enables analysis at a more nuanced level, allowing aggregation and comparison of the relationships that are discussed, above and beyond just the words used.
Bettman, J.R., and B.A. Weitz
1983 Attributions in the board room: Causal reasoning in corporate annual reports. Administrative Science Quarterly. 28. 165-183.
1991 The storytelling organization: A study of story performance in an office- supply firm. Administrative Science Quarterly. 36. 106-126.
Caruana, R., and A. Niculescu-Mizil
2006 An empirical comparison of supervised learning algorithms. Proceedings of the 23rd international conference on Machine learning. . 161-168.
1986 An investigation into annual report readability and corporate risk-return relationships. Accounting and Business Research. 16. 285-294.
1992 Discourse and social change. Cambridge, MA: Polity Press.
Funk, R.J., and D. Hirschman
2014 Derivatives and deregulation financial innovation and the demise of Glass-Steagall. Administrative Science Quarterly. 59. 669-704.
2018 When the Fed speaks: Arguments, emotions, and the micro-foundations of institutions. Administrative Science Quarterly. Forthcoming. .
Harmon, D.J., S.E. Green, Jr., and G.T. Goodnight
2015 A model of rhetorical legitimation: The structure of communication and cognition underlying institutional maintenance and change. Academy of Management Review. 40. 76-95.
2014 Dilettante or renaissance person? How the order of job experiences affects hiring in an external labor market. American Sociological Review. 79. 136-158.
2008 Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics. 45. 221-247.
Maguire, S., and C. Hardy
2009 Discourse and deinstitutionalization: The decline of DDT. Academy of Management Journal. 52. 148-178.
Martin, J., M.S. Feldman, M.J. Hatch, and S.B. Sitkin
1983 The uniqueness paradox in organizational stories. Administrative Science Quarterly. 28. 438-453.
Matthews, P.H., and P.H. Matthews
1981 Syntax. Cambridge, UK: Cambridge University Press.
Ocasio, W., and J. Joseph
2005 Cultural adaptation and institutional change: The evolution of vocabularies of corporate governance, 1972-2003. Poetics. 33. 163-178.
Pang, B., L. Lee, and S. Vaithyanathan
2002 Thumbs up?: Sentiment classification using machine learning techniques. Proceedings of the ACL-02 conference on Empirical methods in natural language processing. 10. 79-86.
2012 Processing fluency and investorsÆ reactions to disclosure readability. Journal of Accounting Research. 50. 1319-1354.
Tausczik, Y.R., and J.W. Pennebaker
2010 The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology. 29. 24-54.
Van Valin, R.D., and R.J. LaPolla
1997 Syntax: Structure, Meaning, and Function. : Cambridge University Press.