This study examined the measurement quality of an instructor-developed classroom assessment instrument (i.e., a final written test) used to assess linguistic knowledge covered in an undergraduate elementary Chinese course at a U.S. university. Participants were 222 learners enrolled in the Chinese course from Fall 2011 to Fall 2013. Analyses were performed on a subset of 64 binary-scored (0/1) test items. The 64 items showed acceptable overall test reliability, test discrimination, and Rasch model fit. Meanwhile, insufficient number of difficult items, below-threshold discriminatory power of certain items, and existence of measurement redundancy were also found. Strategies for improving the measurement quality of the test were discussed.

本文考察了一套用于美国一所大学的本科初级汉语课程的总结性评价工具(即期末考试试卷)的测量质量。这是一套授课教师自行编制的、用来测试学生掌握初级汉语课程所教授的语言知识程度的试卷。被试为222名在2011年秋季学期至2013年秋季学期期间注册初级汉语课程的学生。对该试卷所包含的64道0/1计分的试题的测试数据进行分析,结果表明,64道试题总体信度、区分度及Rasch 模型拟合度达到了可接受的水平。同时,研究也揭示了该试卷存在的问题,即难度较高的试题数量不足,部分试题区分度未达到临界水平,以及存在测量冗余的现象。本文最后讨论了提高该试卷测量质量的方法。