Foreign language (FL) testing refers to an area within the broader field of second language (L2) testing or assessment. FL and L2 are terms used for talking about the learning and testing of a non-native language. L2 is a broader term, which indicates that a language is not a person’s first language. FL is used to further clarify that this L2 is learned in an area where people do not use this language as a medium of instruction or communication. For example, because English is largely people’s first language in the United States, those learning English in the United States – as an added language to their native one – are said to be learning English as an L2. On the other hand, Spanish/German/Chinese/Arabic/etc. are considered to be FLs in the US context, and those studying such languages in the United States are said to be learning them as a FL. In this article, the focus is on the FL testing scholarship in the United States. FL testing attempts to measure test takers’ reading, writing, listening, speaking, and an integration of two or more of these modalities. It may also seek to assess features of the construct as defined by a theoretical model. FL construct features include vocabulary, grammar, pronunciation, fluency, pragmatics, and textual organisation, among others.

FL testing has played a pioneering role in developing thinking around key issues in assessment as well as connections to instruction and learning. For example, since the 1970s, FL scholars have developed performance-based assessments, not only in the context of classroom assessments but also as part of large-scale testing (Liskin-Gasparro, 2001). Discussions within what has been termed the proficiency movement culminated in the publication of the Provisional Oral Proficiency Guidelines in 1982 (ACTFL, 1982) to guide the testing of speaking in a variety of languages being taught in FL classrooms and/or used for employment purposes. FL performance-based testing efforts were being formalised long before performance assessment was brought into educational measurement large-scale testing practices.

The American Council on the Teaching of Foreign Languages (ACTFL) is the leading organisation that directs FL professional activities in the United States, including teacher professional development and testing. A proficiency framework/scale is essentially a hierarchical description of characteristics at various points along the FL proficiency continuum. In 1986, ACTFL published its first complete edition of the language proficiency scale, the ACTFL Proficiency Guidelines, based on an adaptation of the US federal government Interagency Language Roundtable (ILR) Skill Level Descriptions as well as the Foreign Service Institute (FSI) scale. The FSI scale and ILR Skill Level Descriptions are commonly used to define and refer to FL proficiency across government agencies, primarily for job-related purposes. The government scales and the ACTFL Proficiency Guidelines specify a hierarchy of descriptors that emphasise how test-takers at certain levels are likely to perform in spontaneous communication, i.e., show what they can do with a FL in real-world situations.

The earliest direct application of the ACTFL Proficiency Guidelines is for the assessment of speaking using the Oral Proficiency Interview (OPI). The OPI, anchored in the ACTFL Proficiency Guidelines content and grading, is a one-on-one interview between a test-taker and a certified tester, which lasts ten to thirty minutes. The stated purpose of the OPI is to assess oral proficiency via natural conversation. OPI interactions are formulated based on test-takers’ background and interests as well as their ongoing performance. The OPI is available in English and seventy-eight foreign languages.

Testing systems are commonly acknowledged to have positive and negative impact on individuals, groups, and educational structures (Chalhoub-Deville, 2016). The impact of the ACTFL Proficiency Guidelines in the United States has been widely discussed in the published literature. The ACTFL Guidelines have played an influential role in moving FL education from the grammar-translation era to a more communicative approach. The literature also documents their influence on the development of proficiency scales worldwide, e.g., the Australian/International Second Language Proficiency Rating Scale (ISLPR), the Canadian Language Benchmark (CLB), and the Council of Europe Framework of Reference (CEFR). Since the early 2000s the CEFR has been increasingly dominating the FL field in terms of its widespread use and influence in Europe and around the world. ‘The CEFR seems to be overtaking all other frameworks in terms of research, development of materials, innovation, and access’ (Chalhoub-Deville, 2009b, p. 249). This widespread use of the CEFR has prompted the ACTFL organisation to pursue an alignment of some of its prominent assessments with the CEFR levels. A major concern expressed with proficiency scales like the ACTFL Guidelines and the CEFR pertains to their political backing that has contributed to their rise, sustained their spread, and rendered them immune to scholarly critique. These scales have been reified (i.e., perceived to represent reality in terms of FL learning and performance).

Influential large-scale FL testing systems available from different providers in the United States include the ACTFL Assessment of Performance Toward Proficiency in Languages (AAPPL), which measures 5th-12th grade students’ performance in a classroom setting. AAPPL is now available in 12 FLs as well as English as a second language. Another popular FL testing system is available from the College Entrance Examination Board (College Board). The College Board develops and administers Advanced Placement (AP) World Languages & Cultures Exams whereby test scores are part of college applications and considered for college course replacement/college credits. These College Board AP assessments are available in seven languages.

Given the aim of the present article to provide an overview of the FL testing landscape, readers, especially those who have minimal to moderate background in this area, will find the information provided of use. Several of the sections present key scholarship in terms of the construct and proficiency scales that have widely informed FL research and test development efforts. The sections describe the Communicative Language Ability model, which has long been a prevailing model, in the United States and elsewhere, guiding research on language test constructs and anchoring test development practices. The sections also describe prominent proficiency scales in the United States and in Europe – the ACTFL Guidelines and the CEFR – present a comparative analysis of the two frameworks in terms of their political backing and impact, and summarise publications that describe their alignment to one another. The article also focuses on popular large-scale FL testing systems available from different test developers in the United States.