ABSTRACT

Airport baggage security screeners are increasingly being trainedwith sophisticated software such as X-Ray Tutor that individually adapts its difficulty to compensate for their improvements in detection performance (Schwaninger, 2004b; Koller et al., 2008; Michel et al., 2007; Schwaninger et al., 2008). At several airports, screeners are continually being tested on-the-job through a system known as “threat image projection” (TIP; e.g. Schwaninger, 2004a; Hofer and Schwaninger, 2005) that superimposes fictional threat items into actual x-ray luggage images. The basic principles behind the TIP system actually predate airport security screening, where Broadbent (1964) notes “it has often been suggested that in practical situations the efficiency of radar monitors or industrial inspectors could be improved by the insertion of artificial signals interspersed amongst the real ones” (p. 18). With modern advances in computer systems (Schaller, 1997) it is now readily possible to examine the performance of screeners while they work and provide a system of certification relating to performance ability. Having competent workers is important in any industry, but particularly so in

airport security. Since certain visual abilities are essential to become a good x-ray screener (Hardmeier, Hofer and Schwaninger, 2005; Hardmeier & Schwaninger, 2008), pre-employment testing needs to form a staple part of screener employment procedures. Training that is individually adaptive has the benefit of seeking weaknesses and addressing them with increased load into those areas (Schwaninger,

2004b). While screeners need to demonstrate improvements they also need to fall within accepted working norms which can be deduced from retrospective analyses of TIP data and standalone performance testing. As has been shown in previous research (i.e. Hofer and Schwaninger, 2005), there are wide variations in operator ability but with appropriate training the entire distribution should move as each screener makes their own improvements (Schwaninger, Hofer, & Wetter, 2007). Given that terrorist threats are “both productive and diverse” (Lui et al., 2007, p.301), training systems must be adaptive, recurrent and attempt to predict the mindset of potential terrorists in a changeable climate. Security screening is therefore both demanding and important, requiring research-led training systems capable of quantifying real-life performance. Detection performance in terms of sensitivity is typically viewed as being a func-

tion of hits (correctly identifying an object that contains a threat item), false alarms (incorrectly stating a threat item is present), and in certain cases confidence ratings (Green & Swets, 1966; MacMillan & Creelman, 1991). A vital aspect of performance is the speed inwhich an operator can performbag searcheswhilemaintaining optimal performance levels, which signal detection theory cannot accommodate as it assumes a fixed sampling interval (Smith, 2000). Thus, Drury’s Two-Component Model (TCM; Drury, 1975) can be used to approximate the speed taken to search for items, and the average decision time used. This model is a useful complement to signal detection theory and provides insight into whether search or decision time improves with on-the-job training. With the goal of applying SDT and TCM estimates to a large dataset, we have

collated the data from a longitudinal project to evaluate performance changes in screeners who are part of a battery of training systems developed by theVICOREG research group (Schwaninger, 2004a, 2004b; Schwaninger, Hofer and Wetter, 2007). While it is impossible to eliminate effects generated by work days during the year, confounding variables or other sources of nuisance variance the advantages of being able to gauge actual real-life performance in screeners far outweigh the detractions. The benefits of utilising scientifically based training can therefore be deduced by comparing the two test performances. With a median date interval exceeding one year between the two tests it can be seen whether screener performance changes using a large sample of workers in a standardised test.