ABSTRACT

Imagine a research project to help pilots better understand the flight management system (FMS) modes and respond appropriately to unexpected events. Twenty line pilots from commuter airlines are selected to participate. Ten undergo a conventional training program, and the others go through a novel, augmented program. After a one-week delay, the participants return to the lab for a test of learning, in which they encounter an unexpected configuration of the FMS in a high-fidelity simulator. The time until the initial correct diagnosis and response serves as a measure of performance. Data show a mean response time of 9.5 seconds for the augmented training group and 14 seconds for the control group, a non-significant (n.s.) (p >.05) effect. A follow up study of 16 participants (8 per group), using a slightly revised version of the augmented curriculum, also produces a non-significant benefit (M = 3 seconds, p > .05) for the new training program. The researchers conclude, based on the two studies showing no significant benefit, that the new curriculum is no more effective than the standard training program. The developer of the augmented curriculum points out, however, that if the samples of the two studies are pooled, with a resulting N of 18 per group, the mean difference of 4.5 seconds between groups is significant (p < .05). Furthermore, the developer notes that the p-values in the original studies were, respectively, .07 and 0.11, both near the “official” .05 cutoff.