ABSTRACT

THIS CHAPTER IS more ambitious than the previous ones. Our goal is con-sider the totality of natural scenes, images of the world around us-outside, inside, people, buildings, objects, etc.—and ask what statistical regularities this ensemble of images has. It is rather surprising that there are many quite distinctive properties that capture a nontrivial amount of the “look-and-feel” of natural images. If we use the language analogy, the structures in speech are often divided into a hierarchy of levels: phonology, syntax, semantics, and pragmatics. In this analogy, we are studying in this chapter the phonology of images. Syntax was the main subject of Chapters 3 and 4 and semantics refers to specific object classes, such as faces as discussed in Chapter 5. But as we will see, this very basic low level of analysis is not a poor relation. However, we discuss in this chapter only the construction of data structures and probability models incorporating fundamental image statistics, especially scale invariance (see bottom image in Figure 6.1). We do not discuss their use in analyzing specific images via Bayes’ theorem. In fact, in the present state of the art, finding significant objects at all scales-distinguishing clutter made up of smaller objects from texture that is not (see top image in Figure 6.1), has not been widely addressed (but see [79] for a multiscale segmentation algorithm). It seems important to know the basic statistical nature of generic real-world images first. For example, if hypothesis testing is used to check whether some part of an image is a face, surely the null hypothesis should be a realistic model incorporating these statistics.