ABSTRACT

Before the results from two test administrations are used to measure change, it is appropriate to inquire about what the tests are and are not measuring, and whether what is measured is the same across cohort groups and selected subpopulations within cohorts. Logic might dictate that one first define what the tests measure and then evaluate their precision. The order is reversed here because: (a) reliability estimates are going 192

to be needed for the interpretation of the factor analytic results, and (b) we have chosen to "mark" our potential factors with reliable parcels or clusters of items that, in theory, share common content.1 Independent estimates of coefficient alpha provide some empirical evidence for any subsequent interpretation of these subtests as separate homogeneous entities. With the exception of the biology parcel, the remaining parcels have reliabilities that are in the "ball park" for the number of items included in the parcel. This finding suggests that each of the parcels is relatively internally consistent and, as a consequence, can be assumed to be a measure of a reasonably homogeneous content area. Content areas within a particular discipline, such as the content parcels within the science area, can then be hypothesized as measures or indicators of a science construct or factor in a confirmatory factor analytic solution.