Menai Insight combines qualitative approaches to textual analysis to ensure that nuanced relationships with sentences are captured, and machine learning to ensure that text is processed to a high degree of accuracy across millions of sentences. This article explains key stages in our approach to ensure high-accuracy large scale classifications of the text.
As described in a previous article, existing computational textual analysis approaches that take words as the fundamental unit of analysis (e.g., word counts or word-correlation measures, such as topic models), are limited in their ability to capture nuanced relationships conveyed with sentences, nor the multiple layers of structure in documents. This article explains some of the opportunities arising from being able to systemically capture within-sentence relationships, by transforming raw text into consistent textual structures.
One of the primary ways in which our textual structures make text easier to manipulate is by removing surface-level variations, including acronym, synonyms, label names and slight variations in sentence construction. Each of these variations introduce substantial differences into sentences, and mean that with the exception of some boiler-plate sentences and year-to-year within-firm re-use of material, almost no two sentences in company filings are identical.
This article explains how, by abstracting/capturing these surface-level variations, our textual structures make it feasible to examine the similarity in the conveyed relationships, rather than whether those sentences use exactly the same words to convey their messages.
Despite growing research in linguistic analysis, most large-scale research to-date on organizational communications takes the word as the unit of analysis, analyzing the text based on word counts, or correlation in word usage. The key basis of Menai Insight's tools is that there is a layer or meaning in text beyond the words on the page, captured only when considering the relationships conveyed when these words are organized together. That is, textual information express more nuanced relationships, such as justification for particular decisions, expectations of future states, and caveats to an argument. Word-counts and word-clustering approaches, such as topic modeling, abstract away the specific relationships between how the words are structured to form relationships, and how those relationships are structured together into meta-structures.
This article summarizes the limitations of capturing meaning with approaches that take the word as the fundamental unit of analysis; fundamental issues that our approach to capturing meaning seeks to address.
There is no question that textual information represents a rich source of historical information to understand an array of firm actions. From analyzing strategic actions taken by organizations (Tripsas and Gavetti, 2000; Benner, 2010), to studying the dynamics of firm and stakeholder interactions (Chen and Hambrick, 1995; King, 2008), to understanding the overall evolution of a field or society (Fairclough, 1992; Barley and Kunda, 1992), the qualitative textual information produced by organizations, information intermediaries, and other stakeholders presents a wealth of detailed historical information for organizational researchers. In addition to conveying factual information, communication provides opportunities to shape reality (Berger and Luckmann, 1967; Potter, 1996), and organizational communication is a way of managing audience perceptions (Bettman and Weitz, 1983; Elsbach, 2006; Fiss and Zajac, 2006).