Nonlinear Filtering Using Statistical Signal Models | 33 | v2

ABSTRACT

Linear methods that satisﬁed the principle of superposition dominate current signal processing theory and practice. Linear signal processing is founded in the rich theory of linear systems, and in many applications linear signal processing methods prove to be optimal. Moreover, linear methods are inherently simple to implement, with their low computational cost, perhaps the dominant reason for their widespread use in practice. While linear methods will continue to play a leading role in signal processing applications, nonlinear methods are emerging as viable alternative solutions. The rapid emergence of nonlinear signal processing algorithms is motivated by the growth of increas-

ingly challenging applications, for instance in the areas of multimedia processing and communications. Such applications require the use of increasingly sophisticated signal processing algorithms. The growth of challenging applications is coupled with continual gains in digital signal processing hardware, in terms of speed, size, and cost. These gains enable the practical deployment of more sophisticated and computationally intensive algorithms. Thus, nonlinear algorithms and ﬁltering methods are being developed and employed to address an increasing share of theoretical problems and practical applications. A disadvantage of nonlinear approaches is that, unlike their linear counterparts, nonlinear methods

lack a uniﬁed and universal set of tools for analysis and design. The lack of unifying theory has led to the

development of hundreds of nonlinear signal processing algorithms. These algorithms range from theoretically derived broad ﬁlter classes, such as polynomial and rank-order based methods [1-9], to boutique methods tailored to speciﬁc applications. Thus the dynamic growth of nonlinear methods and lack of unifying theory makes covering the entirety of such operators in a single chapter impossible. Still, large classes of nonlinear ﬁltering algorithms can be derived and studied through fundamentals that are well founded. The fundamental approach adopted in this chapter is that realized through the coupling of statistical

signal modeling with optimal estimation-based ﬁlter development. This general approach leads to a number of well-established ﬁltering families, with the speciﬁc ﬁltering scheme realized depending on the estimation methodology adopted and the particular signal model deployed. Particularly amenable to ﬁlter development is the maximum likelihood estimation (M-estimation) approach. Originally developed in the theory of robust statistics [10], M-estimation provides a framework for the development of statistical process location estimators, which, when employed with sliding observation windows, naturally extend to statistical ﬁltering algorithms. The characteristics of a derived family of ﬁltering operators depend not only on the estimation

methodology upon which the family is founded, but also on the statistical model employed to characterize a sequence of observations. The most commonly employed statistical models are those based on the Gaussian distribution. Utilization of the Gaussian distribution is well founded in many cases, for instance due to the central limit theorem, and leads to computationally simple linear operations that are optimal for the assumed environment. There are many applications, however, in which the underlying processes are decidedly non-Gaussian. Included in this broad array of applications are important problems in wireless communications, teletraﬁc, networking, hydrology, geology, economics, and imaging [11-15]. The element common to these applications, and numerous others, is that the underlying processes tend to produce more large magnitude observations, often referred to as outliers or impulses, than is predicted by Gaussian models. The outlier magnitude and frequency of occurrence predicted by a model is governed by the decay rate of the distribution tail. Thus, many natural sequences of interest are governed by distributions that have heavier tails (e.g., lower tail decay rates) than that exhibited by the Gaussian distribution. Modeling such sequences as Gaussian processes leads not only to a poor statistical ﬁt, but also to the utilization of linear operators that suffer serious degradation in the presence of outliers. Couplings, an estimation (ﬁltering) methodology with a statistical model appropriate for the observed

sequence, signiﬁcantly improves performance. This is particularly true in heavy tailed environments. As an illustrative example, consider the restoration of an image corrupted by (heavy tailed) salt and pepper noise. Typical sources of salt and pepper include ﬂecks of dust on the lens or inside the camera, or, in digital cameras, faulty CCD elements. Figure 26.1 shows a sample corrupted image, the results of twodimensional linear and nonlinear ﬁltering, and the true underlying (desired) image. It is clear that the linear ﬁlter, unable to exploit the characteristics of the corrupting noise, provides an unacceptable result. On the other hand, the nonlinear ﬁlter, utilizing the statistics of the image, provides a very good result. The nonlinear ﬁltering utilized in this example is the median, which is derived to be optimal for certain heavy tailed processes. This appropriate statistical modeling results in performance that is far superior to linear processing, which is inherently based on the processing of light tailed samples. To formally address the processing of heavy tailed sequences, this chapter ﬁrst considers sequences of

samples drawn from the generalizes Gaussian distribution (GGD). This family generalizes the Gaussian distribution by incorporating a parameter that controls the rate of exponential tail decay. Setting this parameter to 2 yields the standard Gaussian distribution, while for values less than two the GGD tails decay slower than in the standard Gaussian case, resulting in heavier tailed distributions. Of particular interest is the ﬁrst order exponential decay case, which yields the double exponential, or Laplacian, distribution. The Gaussian and Laplacian GGD special cases warrant particular attention due to their theoretical underpinnings, widespread use, and resulting classes of operators when deployed in an M-estimation framework. Speciﬁcally, it is shown here that M-estimation of Gaussian distributed

observations samples leads to traditional linear ﬁltering, while the same framework leads to median ﬁltering for Laplacian distributed samples. Thus as linear ﬁlters are optimal for Gaussian processes, the median ﬁlter and its weighted generalizations are optimal for Laplacian processes. Median type ﬁlters are more robust than linear ﬁlters and operate more efﬁciently in impulsive environments, characteristics that result directly from the heavy tailed characteristic of the Laplacian distribution. Although the Laplacian distribution has a lower rate of tail decay than the Gaussian distribution,

extremely impulsive processes are not well modeled as Laplacian. The GGD family, in fact, is limited in its ability to appropriately model extremely impulsive sequences due to the constraint that, while incorporating freedom in the detail decay rate, the tail decay rate is, nonetheless, restricted to be exponential. Appropriate modeling of such sequences is of critical importance, as a wide variety of extremely impulsive processes are observed in practice. Many such sequences arise as the superposition of numerous independent effects. Examples of which include radar clutter, formed as the sum of many signal reﬂections from irregular surfaces, the received sum of multiuser transmitted signals observed at a detector in a communications problem, the many impulses caused by the contact of rotating machinery parts in electromechanical systems, and atmospheric noise resulting from the superposition of lightningbased electrical discharges around the globe.