One of the primary ways in which our textual structures make text easier to manipulate is by removing surface-level variations, including acronym, synonyms, label names and slight variations in sentence construction. Each of these variations introduce substantial differences into sentences, and mean that with the exception of some boiler-plate sentences and year-to-year within-firm re-use of material, almost no two sentences in company filings are identical.
This article explains how, by abstracting/capturing these surface-level variations, our textual structures make it feasible to examine the similarity in the conveyed relationships, rather than whether those sentences use exactly the same words to convey their messages.
Components of surface-level variation
Our textual structures capture four qualitatively distinct forms of surface level variation, enabling you to focus on comparing meaningful differences between text. Specifically, our structures are designed to capture:
Illustrating the power of standardized textual-structures
The power of capturing information in standardized representation, abstracting surface-level variation, can be illustrated by considering a typical sentence from a manager's background, which may include details of the manager's employer, job title and dates of employment, for example:
While this is one way in which the information can be conveyed, it is possible to write the very same information in multiple different ways, sometime using the same words, and some using different words:
In each case, surface-level variation distorts the ability to directly compare the sentences. As such, it is difficult, even with a small number of sentences, to directly compare whether the same position are listed in subsequent filings, whether the same words are used to describe experiences, nor to directly compare the changes.