Qualitative content analysis of open-ended survey responses is a commonly used research method in science education. However, traditional coding approaches are often time-consuming and prone to inconsistency, especially when applied to large datasets. Existing solutions from Natural Language Processing such as supervised classifiers, topic modeling techniques, and generative large language models have limited applicability in analysis of open-ended survey responses, since they demand extensive labeled data, disrupt established qualitative workflows, and/or yield variable results. In this paper, we introduce a text embedding-based classification framework called Deductive Semantic Content Analysis (DeSCA) that requires only a handful of examples per category to run, is transparent and replicable, and fits well with standard qualitative workflows. When benchmarked against human analysis of a physics education survey consisting of 2899 open-ended responses, the method described by our framework achieves high agreement with expert human coders across ten embeddings models on a simulated exhaustive coding task, using approximately 1-2% of the total dataset for training. The method achieves lower agreement on a complete selective coding task; this performance, however, improves with fine-tuning of the text embedding model, which can be done with a small amount of additional data. We unpack these results in terms of the theoretical assumptions of text embeddings, and further demonstrate how embeddings can be used to audit previously-analyzed datasets for coding consistency. These findings demonstrate that text embedding-assisted coding can flexibly scale to thousands of responses without sacrificing interpretability, opening avenues for deductive qualitative analysis at scale.
A Framework for Deductive Semantic Content Analysis at Scale in Science Education Using Text Embeddings
Qualitative content analysis of open-ended survey responses is a commonly used research method in science education. However, traditional coding approaches are often time-consuming and prone to inconsistency, especially when applied to large datasets.
- Preview

- Year
- 2025
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2508.19836ARXIV-DEFAULT
- TL;DR
- Semantic Scholar