ABSTRACT

Thus, if we were attempting to validate a test for the selection of bottom scourers we would correlate the scores on the tests under consideration with the actual performance of the candidates in scouring bottoms; the latter would constitute the criterion, and the degree of agreement with the criterion of each test would constitute its validity. Some attempts have been made to establish and vali­date opinion measurement by comparing results with outside criteria, and these must now be discussed in some detail.Perhaps the most widely used proof in this connection is the agreement between poll prediction and voting behaviour. We have already noted in an earlier chapter the smallness of the error made by the British Institute of Public Opinion in predicting voting be­haviour during three elections in this country. In America, Gallup has reported that the average error of prediction from 1935-47 was four percentage points; this average relates to over 300 election predictions in the United States. If we look only at presidential elections, Mosteller gives a table showing that the errors of pre­diction in percentage points for the 1936, 1940, 1944, and 1948 elections were 6 5, 3 0, 2 3, and 5-3 per cent, giving an average of 4 3. This may sound reasonable, but before we can estimate the success of forecasting which is implied by an average error of a given magnitude, we need a base line against which to compare the results. Such a base line is furnished by what is called persistence forecasting. This method, taken over from weather forecasting, is a simple routine method in which the forecast for the next occasion is simply made in terms of what happened last time. In weather forecasting one would simply predict that to-morrow will be exactly like to-day. In election forecasting, it would mean predicting that each state will have the same Democratic percentage of the major party vote that it had in the previous presidential election year. This ‘persistence5 method is quite mechanical, requires no new in­vestigation, and may thus serve as a useful base line in terms of which we can estimate the accuracy of the polls.Mosteller presents a table of the errors in persistence forecasting as compared with errors in the forecasts made by Gallup and Crossley, and concludes that ‘taken as a whole, it cannot be said that the polling forecasts in the past four presidental elections have a very distinguished record compared to persistence forecasts, which were as good or better in three out of four elections. The implication here is not that polling is no better, or not much better,

than persistence forecasting but rather that polling has not yet proved its superiority in election forecasting under the conditions obtaining during the last four presidential elections.5Figures in this country are very much more favourable to the polls, a fact which is due equally to the greater accuracy of British polls and to the larger amount of error found in persistence fore­casts. It is difficult to know why there should be these national dif­ferences, and although many reasonable suggestions could be made the answer to this question is, in fact, not known.The stress which has been laid on election forecasting in the at­tempt to prove the validity of opinion polling is somewhat un­fortunate. The reasons for this belief are two-fold. In the first place, even if election forecasting were completely successful and involved errors no larger than those expected on the basis of sampling theory this would, none the less, prove nothing whatsoever about the re­liability and validity of other types of questions. It might be per­fectly possible for people to answer truthfully and accurately ques­tions about their voting intentions, and yet to give quite mislead­ing answers to questions regarding their opinions on other issues. Evidence in favour of this view has been given in the last chapter.Conversely, it may be said that although poll prediction might be extremely inaccurate this would not necessarily prove that opinion polling on other issues would not be useful and valid. It is not always realized that a great deal more is involved in predicting the winner of an electoral contest than simply opinion measure­ment. A minute’s thought, however, will clearly show that atti­tudes and opinions are only one element which will determine the election of one or other of the candidates involved. We shall list a few of the additional difficulties which arise.In the first place, the theory of sampling states that we should select a random or stratified sample of people from a known uni­verse. However, in election predicting there is no known universe. The people whose opinions we want to consult are those voting in the election. However, it is not known at the time of polling, i.e. one or two weeks before election at the latest, who is going to vote. In other words, we are trying to sample a population which, as yet, does not exist. We can make reasonable forecasts in the sense of saying that few Southern Negroes will vote in the United States, or that a larger proportion of men than women will vote in this country, but such forecasts are hazardous and involve a great ad-

ditional element of possible error. Our measurement of opinion might be quite accurate, but if our prediction as to who might vote were to be falsified, our electoral prediction would be very far out.It might, of course, be said that intention to vote or not to vote is itself a psychological variable and should, therefore, be measur­able. Up to a point this is true, but there are many outside factors which influence a person’s actual behaviour. Thus, it is known that very fine, sunny weather, and very poor, rainy weather both tend to lower the poll. A hurricane in Florida or a blizzard in Minnesota may change the percentage of people voting in those States by 50 or more per cent. Even in this country, where the weather tends to be less extreme, its influence cannot be gainsaid. Unless opinion polling institutes set up in the field of weather fore­casting as well, they are labouring under the obvious handicap of having to make a prediction without having all those facts avail­able which will influence the behaviour of people for whom the prediction is made.Another difficulty which is not always realized is that although the polling agency might be correct in saying that candidate A would win 55 per cent of the popular vote, nevertheless, due to the oddities of the English and American election systems the minority candidate might very well win the election. This has happened be­fore in both countries and is an unavoidable feature of any system which does not make use of proportional representation. If this were to happen, then the polls would reflect popular attitude cor­rectly, and the election result incorrectly, and we should end up by coming to the conclusion that the criterion was inferior to the mea­sure which we were trying to validate against it.Nor are these the only defects. Electors might wish to vote for a given candidate but erroneously put their X in the wrong box, or invalidate their paper, or get entangled in the complexities of the machines provided by American States for the purpose of record­ing votes. People may declare in good faith that they are going to vote for a given candidate, only to find that they have not fulfilled the necessary residence qualifications when the time comes to vote. Votes, even after they have been cast, may be miscounted by the Returning Officer; it may even be possible that elections are not honestly conducted and that people long since dead appear on the electoral register and actually cast their votes. This again is more

likely in the United States than in this country; the reader may like to consult Gallup’s account of the notorious Louisiana elections.The factors just mentioned are just a few of the complications which may make election forecasting difficult and inaccurate, al­though the measurement of opinion toward the candidates may be quite valid and reliable. This fact should be borne in mind when estimating such events, for instance, as the widely publicized fail­ure of the polls to predict the winning candidate in the 1948 elec­tion. A great deal has been written on the subject and a selection of the most useful references is given in Technical Note 12. Here we will draw attention only to one interesting fact. In the 1936 election which established Gallup’s fame, and where his forecasts were hailed by the papers as ‘uncannily accurate’, the actual demo­cratic percentage of the two party vote was 62-2 per cent with a prediction of 55-7 per cent. Thus, Gallup predicted the right can­didate within an error of 6-5 per cent. In 1948 the actual percent­age of the two-party vote for the democratic candidate was 49-8 per cent, and Gallup’s prediction 44 5 per cent. Thus, Gallup pre­dicted the wrong candidate with an error of 5 3 per cent and the papers decried his forecasts as useless and the methods as unscienti­fic. This is clearly absurd. Scientifically our interest lies in the size of the error, which was less in 1948 than in 1936. The public and the papers, of course, are interested not in the size of the error but in getting the right prediction, but this, as we have seen, is quite a different matter and one which depends on many other conditions than those taken into account by opinion polling. Unfortunately, Gallup and the other pollsters themselves have played into the hands of their detractors by emphasizing the journalistic aspects of forecasting the winning candidate rather than stressing the scienti­fic aspects of reducing the percentage error. Nevertheless, it is the latter which is important and which has, in fact been reduced from 1936 to 1948.It will be clear from what has been said so far that election pre­diction does not provide us with the required evidence either for or against the validity of opinion measurement. Yet, curiously enough there is very little evidence from other sources which could give us the required information. We have shown in the previous chapter that it is easy to change responses to opinion questions in a large variety of ways. This fact, combined with the lack of data on valid­ity, has led Quinn McNemar, one of the most astute writers on the

subject, to remark that ‘so much is known about the variations which can be produced and so little is known about which varia­tion is most nearly correct that one is apt to become pessimistic concerning the possibility of a single poll ever contributing scienti­fically useful data.5 With this view the present writer would agree.13 The question arises: What alternative is there? There appear to be two main answers. One would be to change from the use of a single question to ascertain attitudes to the use of an attitude scale; the other would be the adoption of a more refined notion of the concept of validity in science. Let us take the question of scaling first.A simple example may make clear the reluctance of most psycho­logists, as distinguished from opinion polling agencies, to regard single question opinion measurement as being of great scientific value. Let us assume that we wish to ascertain the average height of English males over the age of 21. The procedure we would adopt is, of course, a very obvious one. We should pick out a random or stratified sample of the population, measure their height in inches to any desired degree of accuracy, and then average the values ob­tained so as to get the mean height. We should also seek to obtain from our data some measure of the distribution of heights around this mean, i.e. the measure of the tendency in the population either for everyone to be pretty near average or else for some to be very tall, others very short, and so on. Such a measure we might find in what is called the standard deviation or the variance. Lastly, we might seek to give an impression of the actual shape of the distribu­tion by plotting it, as has been done in Figure 9, where different heights are plotted on the abcissa and different numbers of people having these various height measures along the ordinate. This very simple procedure would give us a satisfactory answer to our ques­tion, provided that all the steps had been taken with professional efficiency.Now let us imagine what the procedure would be like if we were to follow the technique of the opinion polling agencies. We should issue all our interviewers with a stick of a given height (corres­ponding to the uniform question asked by them); we should then ask them to apply this stick to a sample of the population and re­port for each person whether he was taller than the stick, smaller than the stick, or about equal in height to the stick. (These three categories would correspond to people saying ‘Yes5, ‘No5, and

‘Don’t Know’ in response to a given question.) In the office these values would be transformed into percentages, and in summary we should read in our morning newspaper that 22 per cent of the population were tall, 70 per cent were small, and 8 per cent did not know whether they were tall or small.This may sound like a parody of public opinion polling, but in actual fact this method still flatters the polling agencies because,

Distribution of Height of 8585 Adult Males as we know the height of the stick, we can from the percentage re­sults deduce a great deal about the true height of the population. This, however, is an item of information which is not usually avail­able to the opinion polls. Let us take as an example a study by Eysenck and Crown on anti-Semitism. Of the population inter­viewed, 38 per cent agreed with the proposition ‘The Jews have too much power and influence in this country’. This might lead us

to say that 38 per cent are anti-Semitic. However, only 24 per cent agreed with the statment ‘Jews will stoop to any kind of deceit in order to gain their own ends5. Only 4 per cent agreed with the pro­position ‘The Jews are the most despicable form of mankind which crawls on this earth5. On the other hand, 84 per cent believed that ‘The dislike of many people for the Jews is based on prejudice, but is nevertheless not without a certain justification5. All these four percentages could be taken as estimates of the degree of anti-Semitic prejudice in the population, which would thus vary ac­cording to the height of the yardstick taken, from 4 to 84 per cent. Clearly, such results are meaningless unless we know something about the exact height of the yardstick; in other words, unless we know something about the degree of anti-Semitism shown by each particular question. But in order to know that we must possess the kind of information which makes the yardstick so useful an instru­ment of measurement. In other words, we must have a true zero point and equal units of measurement. If we have those, and if we can assign an exact point on the resulting scale to our questions, then, and only then, can we interpret opinion polls of social atti­tudes with the same degree of meaningfulness as the results of our hypothetical ‘stick5 estimation of the nation’s average height. As none of these conditions are fulfilled in actual fact, however, it is difficult to escape the conclusion that public opinion polling by single questions is of very doubtful value indeed for the ascertain­ing of social attitudes. Our only escape from this unsatisfactory position is to construct scales of measurement which possess, as will be shown later, many additional advantages, such as those of greater reliability, more easily ascertainable validity, and known dimensionality.An example of one of the earliest scales to be constructed is the Bogardus Social Distance Scale. Arguing that a person’s attitude towards a national or racial group might best be described in terms of the ‘social distance5 at which he would keep members of that group, Bogardus made up the following scale, which can be ap­plied to any racial or national group whatsoever. The instructions are as follows:‘According to my first feeling reactions, I would willingly admit members of each race (as a class, and not the best I have known, nor the worst members) to the classifications which I have en­circled.5 The classifications are:

1. Would admit to close kinship by marriage.2. Would admit to my club as personal chums.3. Would admit to my street as neighbours.4. Would admit to employment in my occupation.5. Would admit to citizenship in my country.6. Would admit as visitors only to my country.7. Would exclude from my country.It will be seen that these seven steps indicate different degrees of ‘social distance’—obviously someone whom we would admit to close kinship by marriage or to our club as a personal friend elicits more positive feelings than someone whom we would admit as a visitor only, or whom we would exclude completely from our TABLE XV

country. Table XV gives some results obtained from 1725 Ameri­cans in 1928 by Bogardus. The nationalities have been grouped into four main groups:1. The Anglo-Saxon.2. The North-European.3. The Southern and Eastern Europeans.4. The Coloured groups.One change has been made from the usual way of presenting data; it will have been noted that a positive answer to the first five questions indicates a favourable attitude to the national group in question, while a ‘Yes’ answer to the last two questions indicates a negative attitude. For the purpose of tabulation, therefore, the percentage of ‘Yes’ answers given to our questions six and seven

have been subtracted from ioo, so that the figures given in Table XV really apply to questions six and seven in reverse, i.e. ‘Would not admit as visitors only to my country’ and ‘Would not exclude from my country’. Two points will be obvious from the table. One is that preferences for nationalities decrease as we go from the Anglo-Saxon, through the North European, to the South and East European and the Coloured groups, i.e. as we descend vertically down the table. The other is that percentages increase as we go horizontally across the table, i.e. as we go from the items indicat­ing little social distance to those indicating greater social distance. It will also be apparent, however, that the steps on Bogardus’s seven point scale are not by any means equal.Similar investigations conducted in this country by the present writer have given relatively similar results. One main difference, however, should be mentioned, and that is a disagreement regard­ing the degree of social distance indicated by the seven steps. Figure io gives results for a representative group of ioo British adults; it will be seen that there is no regular ascent from left to right, but that the lines sag in the middle. In other words, to us, admission to citizenship and to employment indicate a greater degree of ac­ceptance than do admission to street as neighbours or to club as personal friends. This is perhaps a reflection of the much greater laxity of employment and naturalization rules in the United States as compared with this country. However this might be, it indicates that the Bogardus Scale in its original form cannot usefully be ap­plied in this country.For the purpose of obtaining valid results for this country, the scale has been curtailed and made into a four point scale, in which (i) is measured by the ‘marriage’ item, (2) by the average of the ‘employment’, ‘club’, and ‘citizenship’ items, (3) by the ‘neigh­bour’ item, and (4) by the ‘visitors’ and ‘exclusion’ items. When this was done and the populations split into a Conservative and Radi­cal group, the clear-cut results presented in Figures 11 and 12 were obtained. It will be seen that in every case, the Anglo-Saxon group, i.e. the Americans and Irish, were most preferred; the North European group, i.e. the French and Germans, a little less; the South and Eastern European group (Italians, Spaniards, Poles), a good deal less still; and the Coloured groups (Turks, Indians, Negroes), least of all. These results are in good agreement with the American work. It would also be noted that in each case Conserva-

tives tended to put more social distance between themselves and each of the four groups than did the Radicals. Thus, ethnocentrism or the tendency to prefer one’s own immediate in-group and to look down upon and dislike all types of out-groups, appears to be correlated with Conservatism.The advantages of a scale of this type over single question poll­ing are obvious, yet this scale is by no means satisfactory. As Adcock has shown, it is lacking in one important feature, which should characterize any scale, in that it is not uni-dimensional. By that,

Results from Application of Bogardus Social Distance Scale to British Sample we mean that the different questions contained in it do not mea­sure the same variable but tend to measure different variables in different combinations. To take but one example, refusal to let a person of another ethnic group marry into one’s family may in­deed be a reflection of prejudice; it may also be an indication rather of concern for the offspring and the ostracism that children of mixed parentage are likely to experience. In other words, en­dorsement of this item might indicate prejudice; it might also in­dicate instead a realistic appreciation of social forces outside one’s control. We must seek, then, to find a way of ensuring uni-dimen-

Comparison Between Radical and Conservative Scores on Social Distance Scale for Anglo-Saxon and North-European Groups

Comparison Between Radical and Conservative Scores on Social Distance Scale for South-and East-European and Coloured Groups

sionality, i.e. of making certain that all the questions in our scale really measure one and the same underlying attitude.14As an example of the use of more modern methods of scale con­struction, we may follow through the preparation of a scale for the measurement of anti-Semitism. The first step consists in the selec­tion of a large number of attitude statements chosen on the basis of written and spoken comments about the Jews, collected from books, periodicals, and scientific statements made by various groups in­terviewed by the open-end technique. 150 items were collected in this way after over-lapping, unclear, and ambiguous items had been excluded.As a second step, these 150 items, each typed on a separate slip of paper, were submitted individually to 80 judges, who were ask­ed to judge the degree of anti-Semitism shown by each item and to put each in one of 11 piles, the most anti-Semitic on the first, the most pro-Semitic on the n th , and neutral items on the central pile. By averaging the position given to each item by the 80 judges it was then possible to calculate its average position with respect to the degree of anti-Semitism shown. That this rating of the items is a purely cognitive task and is not affected by the attitudes held by the judges is shown by the fact that there was almost perfect agree­ment between the anti-and the pro-Semitic judges.As a third step, all items were discarded which showed consider­able disagreement between the judges as to the exact scale position of that item. This helps to weed out items the meaning of which is doubtful, as it would clearly be useless to have in the question­naire items which could be interpreted as being either favourable or unfavourable to the Jews. Items retained are clearly and un­ambiguously regarded by all the judges as indicating a given de­gree of anti-Semitism.The fourth step then consisted in selecting items which would cover the whole range from very strongly anti-Semitic, through neutral, to strongly pro-Semitic. As far as possible, one would at­tempt to make the intervals between items of degree of anti-Semitism equal, so that one would end up with what Thurstone has called an ‘equal-appearing interval scale5.As the fifth step, the 24 items would be printed in random order, and each supplied with five different possible answers, i.e. strongly agree, agree, uncertain, disagree, strongly disagree. The printed scale is given below. In brackets, after each item is given the scale

position of that item from i, the most pro-Semitic, to 1i, the most anti-Semitic.