Multiple stages of validation help ensure our accuracy
Overview
We have embedded multiple stages of validation through our development process. Here you will find details and overview of how we have embedded validation through each of the development process.
We have embedded multiple stages of validation through our development process. Here you will find details and overview of how we have embedded validation through each of the development process.
Qualitative development of the ontologies
The first stage in classifying the texts involved developing a representation of material. While each sentence in a particular communication medium (e.g., managerial backgrounds) is unique, there is substantial underlying similarity in the material discussed.
For example, managerial backgrounds, typically discuss positions that a manager has worked in, experiences that they have gained, and qualifications and professional licenses that they have received.
Each of the ontologies were developed with substantial qualitative consideration of the underlying material, drawing from research on themes (e.g., Glaser and Strauss, 1967; Ryan and Bernard, 2003) to help ensue that the ontologies reflect the underlying text. After first identifying the primary dimensions of the text, each of these sentences were then dissected again into the components
For example, managerial backgrounds, typically discuss positions that a manager has worked in, experiences that they have gained, and qualifications and professional licenses that they have received.
Each of the ontologies were developed with substantial qualitative consideration of the underlying material, drawing from research on themes (e.g., Glaser and Strauss, 1967; Ryan and Bernard, 2003) to help ensue that the ontologies reflect the underlying text. After first identifying the primary dimensions of the text, each of these sentences were then dissected again into the components
Learn more about our qualitative development process
2. Validation of the properties by dissecting terms to their components
By dissecting concepts to underlying properties, and manually verifying the much reduced number of terms in the sub-concepts, and the sequencing of the sub-concepts, it is possible to validate a much larger number of terms. For example, the validity of concepts comprised of separate parts (e.g., MANAGEMENT_TITLE) can be assessed, despite there being tens of thousands of unique titles at the overall level.
3. Validation of terms through external data-checks
By connecting terms to external databases, it is possible to verify concepts underpinned by a large number of labels, such as location information, that are unfeasible to manually verify, and lack the repetition in underlying words to allow dissection.
4. Validation by context
Checks that the context in which a term occurs in a sentence is appropriate; for example, while the concept sequencing PERSON_NAME IS MANAGEMENT_TITLE AT COMPANY_NAME is common, and appropriate, a concept sequencing such as PERSON_NAME RECEIVED COMPANY_NAME FROM UNIVERSITY_NAME is not common, and not likely to be correct (i.e., likely indicating that a degree acronym has incorrectly been classified as a company name). This validation includes three components:
- Manual checks to identify unlikely concept sequencing.
- Machine-learned identification, where classifications through machine-learning are inconsistent with the classified concept.
- Identification as concept sequencing that does not conform to that expected in the ontology
5. Manual oversight and face-validity
Checks throughout the process to ensure that the ontologies are being populated in-line with those developed.
Beyond documenting the ontologies, the examples, and summary statistics included in Appendix D illustrate that the ontologies, properties, and classifications have a high correspondence to what would be expected.
Beyond documenting the ontologies, the examples, and summary statistics included in Appendix D illustrate that the ontologies, properties, and classifications have a high correspondence to what would be expected.