نبذة مختصرة : Coding responses from free-text, open-ended survey questions (i.e., qualitative analysis) can be a labor-intensive process. The resource requirements for qualitative coding can prevent researchers from extracting value from free-text responses and can influence decisions about the inclusion of open-ended questions on surveys. Machine learning (ML) has been proposed as a potential solution to alleviate coding burden, but traditional ML methods for text classification require large amounts of training data usually not available from surveys. With that problem in mind, we evaluated a ML approach that used responses from an open-ended question on a 2018 employee survey to train a model that predicted a set of codes applied to the same question on the 2019 survey. A coding team then adjudicated these predictions and provided coding corrections when applicable. We achieved promising performance despite an original training dataset of under 3,000 survey responses by using both data augmentation and recent advances in transfer learning models for natural language processing.
No Comments.