How we develop our textual structures, process your text, and support you throughout the process
Development of the Textual Structures
Before beginning to classify the text, we first become very familiar with the underlying material, reading lots of material to gain a deep understanding of the material. Then, drawing from research on themes (e.g., Glaser and Strauss, 1967; Ryan and Bernard, 2003) we identify commonly occurring categories of concepts in the text. For example, in managerial backgrounds, the text naturally falls within 5 primary information types: i) positions, ii) qualifications, iii) qualitative description of experiences, iv) professional licenses, and v) background information such as a manager's name.
After identifying the primary information types, we then dissect each of the sentences into components, again by identifying the key dimensions underlying the text, continuing this process until entire sentences are captured in a structure similar to that below:
After identifying the primary information types, we then dissect each of the sentences into components, again by identifying the key dimensions underlying the text, continuing this process until entire sentences are captured in a structure similar to that below:
{
"ORIGINAL_SENTENCE":"From October 2001 to November 2004, she served as Vice President of Operations and a
director for QRS Corp., a gold mining company, and between March 1996 and May 2001 was the
CEO of Vaynol Clothing, a leading US retailer of women's clothing",
"POSITIONS":[
{ "ORIGINAL":"From October 2001 to November 2004, she served as Vice President of Operations and a director of
QRS Corp., a retail supply chain software and services company",
"START_DATE":{"ORIGINAL":"October 2001","YEAR":2001,"MONTH":10},
"END_DATE":{"ORIGINAL":"November 2004","YEAR":2004,"MONTH":11},
"JOB_TITLES":[{"ORIGINAL":"Vice President of Operations","LEVEL":"VICE_PRESIDENT","AREA":["OPERATIONS"]},
{"ORIGINAL":"Director","LEVEL":"DIRECTOR"}],
"COMPANY":{"ORIGINAL":"QRS Corp.","CLEANED":"QRS"},
"COMPANY_DESCRIPTION":{"ORIGINAL":"NYSE-listed gold mining company",
"LISTING-OWNERSHIP":{"OWNERSHIP_TYPE":"PUBLICALLY_LISTED",
"EXCHANGE":[{"EXCHANGE_NAME":"NYSE","COUNTRY":"USA"}]},
"INDUSTRY":{"NAICS_3_DIGIT":212,
"NAICS_3_DESCRIPTION":"Mining (except oil and gas)"}}},
{ "ORIGINAL":"between March 1996 and May 2001 was the CEO of Vaynol Clothing, a leading US retailer of women's
clothing",
"START_DATE":{"ORIGINAL":"March 1996","YEAR":1996,"MONTH":3},
"END_DATE":{"ORIGINAL":"May 2001","YEAR":2001,"MONTH":5},
"JOB_TITLES":[{"ORIGINAL":"CEO","LEVEL":"CEO"}],
"COMPANY":{"ORIGINAL":"VAYNOL Clothing","CLEANED":"VAYNOL COTHING"},
"COMPANY_DESCRIPTION":{"ORIGINAL":"leading US retailer of women's clothing",
"INDUSTRY":{"NAICS_3_DIGIT":448,
"NAICS_3_DESCRIPTION":"Clothing and Clothing Accessories Stores"},
"REGION":[{"COUNTRY":"USA"}],
"CHARACTERIZATION":[{"TERM":"leading","AREA":"LEADING"}]}
}
The key benefit of the our textual structures is that they transform every sentence in a consistent representation, capturing the meaning of the sentence, while abstracting surface-level variations that can complicate analysis.
Transforming your texts to the textual structures
Populating the Textual Structures
After the textual structures are developed, we then develop approaches to largely-automatically classify the text into the textual structures, combining manual oversight with machine learning to allow us to be able to consistently populate the textual structures across large volumes of text.
The entire population process has substantial manual oversight: while it is unfeasible to read every sentence in a communication medium, our experience in a limited number of communication mediums help us achieve consistent representations. See our validation page for further details on how we ensure high levels of accurate classifications.
Combining turnaround speed and accuracy
Our process is set up to combine speed and accuracy of processing, and to enable this we process your file in two stages:
After the textual structures are developed, we then develop approaches to largely-automatically classify the text into the textual structures, combining manual oversight with machine learning to allow us to be able to consistently populate the textual structures across large volumes of text.
The entire population process has substantial manual oversight: while it is unfeasible to read every sentence in a communication medium, our experience in a limited number of communication mediums help us achieve consistent representations. See our validation page for further details on how we ensure high levels of accurate classifications.
Combining turnaround speed and accuracy
Our process is set up to combine speed and accuracy of processing, and to enable this we process your file in two stages:
- Validation: We aim to get back to you within a few minutes of upload if we detect validation errors
- Initial Processing: Automated processing, returned within 24-hours (although often much quicker)
- Detailed Processing: Incorporating improvements from manual oversight, typically returned within 7-days
Validation
To ensure that we are able to quickly and accurately process your text, we have some basic formatting requirements (similar to if you were entering your data into a statistical package).
As soon as you upload your file for processing, we validate that it meets these requirements, and get back to you within a few minutes so you can quickly correct any issues.
To ensure that we are able to quickly and accurately process your text, we have some basic formatting requirements (similar to if you were entering your data into a statistical package).
As soon as you upload your file for processing, we validate that it meets these requirements, and get back to you within a few minutes so you can quickly correct any issues.
Initial Processing
As soon as you upload your file, we begin working on it - using a fully-automated processing that draws on previous experience to return your processed text within 24-hours (and often within the hour).
As soon as you upload your file, we begin working on it - using a fully-automated processing that draws on previous experience to return your processed text within 24-hours (and often within the hour).
- So long as your text is similar to text that we have previously worked with, the accuracy of initial processing is likely to be high: we train our models with data from US company filings from 1996-2018, to help ensure this.
- The formatting of the returned text mirrors the textual structure that you will ultimately receive, allowing you to begin working on the file and to get a feel of the structure of the data.
- If you made a mistake in what you uploaded (e.g., wrong file), you may also cancel the main processing - the quick turnaround ensures mistakes get spotted quickly, and we'll 'credit back' canceled sentences to your allowance.
Detailed processing
Over the course of about a week, we implement classification improvements focused on the data that you send. This process involves:
Over the course of about a week, we implement classification improvements focused on the data that you send. This process involves:
- Focused improvements: If certain terms/phrases frequently occur in your text, that we have yet to classify, then we will prioritize classifying them
- Identifying new terms and phrases: If there are new terms/phrases in your text, that do not occur in our training data (company filings from 1996-2018), then we will look to classify them.
- Note: At present all plans are intended only for use US company filings, and thus 'new terms' are infrequent. We hope to relax this requirement for the Faculty PRO and PhD plans sometime in 2019.
Note: We are constantly improving our algorithms, enabling us to accurately classify more sentences in initial processing. We expect overall classifications to increase substantially over the end of 2018, and can extend detailed-processing as is necessary.
Continuous development and support
We are here to help you get the most from our service, whether you are a potential customer, a new customer, or have been with us for longer.
Find the help that you need
We have developed a detailed knowledge-base that covers the most commonly asked questions with our service, helping you find immediate answers to your questions.
Dedicated support, with academic research experience (on corporate leadership)
Active development
Find the help that you need
We have developed a detailed knowledge-base that covers the most commonly asked questions with our service, helping you find immediate answers to your questions.
- Help to get you going: Examples to walk you through the process of getting going
- Specifications: Full specifications of all of our textual structures and Excel/CSV summaries
- Account related questions: Help managing your account
Dedicated support, with academic research experience (on corporate leadership)
- Contact our support: Have something that you would like to discuss, we are here to help
- Experience with academic research: We understand the academic publishing process, and our support and development is overseen by our founder/CEO, who has a PhD in Strategy from the University of Michigan.
- Understanding of top manager research: We have publications on top managers in Academy of Management Journal and Administrative Science Quarterly.
Active development
- Continually expanding and improving: Unlike many academic side-projects that are released and then forgotten, we are continually working to improve our offerings. We have big ambitions for our services, and are rapidly looking to expand and improve.