ABSTRACT

In this chapter we describe INCITE, an NLP-based system for scoring free-text responses. We emphasize the importance of context and the system’s intended use and explain how each component of the system contributed to its accuracy. The system was developed to score responses produced as part of the USMLE Step 2 Clinical Skills Examination, a licensing test for physicians. The specifics of that application make our system unique in several ways because: (1) the INCITE was intended for use as part of physician licensure and must therefore function with a specialized medical vocabulary; (2) the scores for the written responses were based entirely on the content of the response, as well-structured and complete sentences are not required and spelling is secondary; (3) the scores from the examination were used to make high-stakes decisions, hence accuracy is critical; and (4) the procedures used for scoring must be transparent so that it is possible to identify the specific scorable features of the text that the algorithm identified in each response. The primary goal with this chapter is to describe the scoring system and report on how each component of that system impacts the accuracy with which scorable concepts can be identified. The development of that system was, however, shaped by the specifics of the application, and the results must be interpreted in that context.