ABSTRACT

International comparisons of next-generation science proficiencies (e.g., science practices such as explanation) remain challenging because of the prohibitive cost of expert translation and expert scoring of assessment products (e.g., written explanations). The proliferation of open-source tools for natural language processing (NLP), machine learning (ML), and machine translation (MT) offer immense potential for advancing the scope of international measurement and assessment research. Our study investigated the extent to which open-source NLP and ML scoring technologies developed using English explanations were able to accurately measure disciplinary core ideas in machine translations of Chinese, German, and Indonesian explanations. Specifically, we examined the extent to which (1) MT of written explanations from multiple languages into English and (2) ML-based scoring of these translations aligned with human scoring derived from human translation. Our study corpus was generated from students’ written explanations from ACORNS instrument items (444 Chinese, 371 Indonesian, 219 German students). Our analyses indicated that the NLP-based ML scoring models built using English explanations from American students were able to robustly measure disciplinary core ideas in Chinese, Indonesian, and German explanations (> 0.81 kappa in most cases). Moreover, the MT (Google translator) displayed very promising results that nearly reached our quality control benchmark (total score correlations > 0.9). Our study suggests that open-source NLP-based ML and MT technologies have great potential as cost-effective tools for advancing small- and large-scale international measurement and assessment projects. We discuss the limitations and future directions of using open-source tools in educational assessment research.