ABSTRACT

The majority of prior studies linking linguistic knowledge and science performance rely on correlations between success in science classrooms and language skills measured in various assessments. This study takes a novel approach and examines the linguistic features of student language while they are engaged in collaborative science problem solving within a week-long “science event” that relied heavily on an on-line educational game that taught virology. We transcribe the students’ speech and use natural language processing tools to extract linguistic information related to text cohesion, lexical sophistication, and sentiment. Our criterion variables were student scores on a pre-test and post-test that assessed science knowledge specific to virology. In addition to examining relations between linguistic features of student language production and science scores, we also control for a number of non-linguistic factors including gender, age, group, and prior experience with technology. Linear mixed effect modeling indicated that non-linguistic factors were predictive of science scores (R2 = .482) as were linguistic variables (R2 = .525). A model that combined both non-linguistic and linguistic factors explained the greatest amount of variance in the science scores (R2 = .609). The results indicate that natural language processing tools can help researcher and educators between understand links between language production and success in science education.