ABSTRACT

In their simplest form, problems of testing for goodness rather than lack of fit involve a fully specified multinomial distribution, together with some sufficiently small “neighborhood” of distributional models of the same kind which according to the alternative hypothesis to be established by means of the data, contains the true distribution from which the sample has been taken. In such a setting the primary data set consists of n mutually independent random vectors (Y11, . . . , Y1k), . . . , (Yn1, . . . , Ynk) of dimension k ≥ 2 where (Yi1, . . . , Yik) indicates in what out of k mutually exclusive categories the ith sampling unit is observed to be placed. Thus, (Yi1, . . . , Yik) consists of exactly k − 1 zeros and a single one, and the probability that the nonzero component appears at the jth position is assumed to be the same number πj ∈ [0, 1] for each element of the sample. As usual in the analysis of categorical data, these vectors are aggregated to the corresponding cell counts (X1, . . . , Xk) form the beginning defining Xj ≡ #{i | i ∈ {1, . . . , n}, Yij = 1} for j = 1, . . . , k. Then, we have

∑k j=1 Xj = n, and the distribution of (X1, . . . , Xk) is multinomial with

parameters n and π = (π1, . . . , πk) given by the probability mass function

P [ X1 = x1, . . . , Xk = xk

] = n !