Removal of Artifacts | 10 | Biomedical Image Analysis

ABSTRACT

Removal of Artifacts

Noise is omnipresent Biomedical images are often aected and corrupted by

various types of noise and artifact Any image pattern or signal other than

that of interest could be termed as interference artifact or simply noise

The sources of noise could be physiological the instrumentation used or

the environment of the experiment The problems caused by artifacts in

biomedical images are vast in scope and variety their potential for degrading

the performance of the most sophisticated image processing algorithms is high

The removal of artifacts without causing any distortion or loss of the desired

information in the image of interest is often a signicant challenge The

enormity of the problem of noise removal and its importance are reected

by the placement of this chapter as the rst chapter on image processing

techniques in this book

This chapter starts with an introduction to the nature of the artifacts that

are commonly encountered in biomedical images Several illustrations of im

ages corrupted by various types of artifacts are provided Details of the design

of lters spanning a broad range of approaches from linear spacedomain and

frequencydomain xed lters to the optimal Wiener lter and further on

to nonlinear and adaptive lters are then described The chapter concludes

with demonstrations of application of the lters described to a few biomedical

images

Note A good background in signal and system analysis as well

as probability random variables and stochastic processes

is required in order to follow the procedures and analyses

described in this chapter

Characterization of Artifacts

Random noise

The term random noise refers to an interference that arises from a random

process such as thermal noise in electronic devices and the counting of photons

A random process is characterized by the PDF representing the probabilities

of occurrence of all possible values of a random variable See Papoulis

or Bendat and Piersol for background material on probability random

variables and stochastic processes

Consider a random process that is characterized by the PDF p

The

process could be a function of time as t or of space in D D or D

as x x y or x y z it could also be a spatiotemporal function as

x y z t The argument of the PDF represents the value that the random

process can assume which could be a voltage in the case of a function of time

or a gray level in the case of a D or D image The use of the same symbol

for the function and the value it can assume when dealing with PDFs is useful

when dealing with several random processes

The mean

of the random process is given by the rstorder moment

of the PDF dened as

where E represents the statistical expectation operator It is common to

assume the mean of a random noise process to be zero

The meansquared MS value of the random process is given by the

secondorder moment of the PDF dened as

The variance

of the process is dened as the second central moment

The square root of the variance gives the standard deviation SD

of the

process Note that

If the mean is zero it follows that

that is the variance and the MS values are the same

Observe the use of the same symbol to represent the random variable

the random process and the random signal as a function of time or space

The subscript of the PDF or the statistical parameter derived indicates the

random process of concern The context of the discussion or expression should

make the meaning of the symbol clear

When the values of a random process form a time series or a function of

time we have a random signal or a stochastic process t see Figure

When one such time series is observed it is important to note that the entity

represents but one single realization of the random process An example of a

random function of time is the current generated by a CCD detector element

due to thermal noise when no light is falling on the detector known as the

dark current The statistical measures described above then have physical

meaning the mean represents the DC component the MS value represents the

average power and the square root of the meansquared value the root mean

squared or RMS value gives the average noise magnitude These measures

are useful in calculating the SNR which is commonly dened as the ratio of

the peaktopeak amplitude range of the signal to the RMS value of the noise

or as the ratio of the average power of the desired signal to that of the noise

Specialpurpose CCD detectors are cooled by circulating cold air water or

liquid nitrogen to reduce thermal noise and improve the SNR

FIGURE

A time series composed of random noise samples with a Gaussian PDF having

and

MS value RMS See also Figures and

When the values of a random process form a D function of space we

have a noise image x y see Figures and Several possibilities arise

in this situation We may have a single random process that generates random

gray levels that are then placed at various locations in the x y plane in some

structured or random sequence We may have an array of detectors with one

detector per pixel of a digital image the gray level generated by each detector

may then be viewed as a distinct random process that is independent of those

of the other detectors A TV image generated by such a camera in the presence

of no input image could be considered to be a noise process in x y t that

is a function of space and time

A biomedical image of interest fx y may also for the sake of generality be

considered to be a realization of a random process f Such a representation

FIGURE

An image composed of random noise samples with a Gaussian PDF having

and

MS value RMS The normalized

pixel values in the range were linearly mapped to the display range

See also Figures and

When we have two jointly normal random processes x and y the bivariate

normal PDF is given by

x y

exp

where is the correlation coecient given by

E x

If the two processes are uncorrelated The bivariate normal PDF then

reduces to a product of two univariate Gaussians which implies that the two

processes are statistically independent

FIGURE

Three Gaussian PDFs Solid line Dashed line

Dotted line

The importance of the Gaussian PDF in practice arises from a phenomenon

that is expressed as the central limit theorem The PDF of a ran

dom process that is the sum of several statistically independent random pro

cesses is equal to the cascaded convolution of their individual PDFs When a

large number of functions are convolved in cascade the result tends toward a

Gaussianshaped function regardless of the forms of the individual functions

In practice an image is typically aected by a series of independent sources

of additive noise the net noise PDF may then be assumed to be a Gaussian

Uniform All possible values of a uniformly distributed random process

have equal probability of occurrence The PDF of such a random process over

the range a b is a rectangle of height

over the range a b The mean of

the process is

and the variance is

Figure shows two uniform

PDFs corresponding to random processes with values spread over the ranges

and The quantization of gray levels in an image to a nite

number of integers leads to an error or noise that is uniformly distributed

Poisson The counting of discrete random events such as the number of

photons emitted by a source or detected by a sensor in a given interval of

time leads to a random variable with a Poisson PDF The discrete nature

of the packets of energy that is photons and the statistical randomness in

their emission and detection contribute to uncertainty which is reected as

FIGURE

Two uniform PDFs Solid line range Dashed line

range

quantum noise photon noise mottle or Poisson noise in images Shot noise

in electronic devices may also be modeled as Poisson noise

One of the formulations of the Poisson PDF is as follows The probability

that k photons are detected in a certain interval is given by

P k exp

Here is the mean of the process which represents the average number of

photons counted in the specied interval over many trials The values of P k

for all integer k is the Poisson PDF The variance of the Poisson PDF is

equal to its mean

The Poisson PDF tends toward the Gaussian PDF for large mean values

Figure shows two Poisson PDFs along with the Gaussians for the same

parameters it is seen that the Poisson and Gaussian PDFs for

match each other well

FIGURE

Two Poisson PDFs with the corresponding Gaussian PDFs superimposed

Bars with and dashed envelope

Bars with and solid

envelope

Laplacian The Laplacian PDF is given by the function

exp

where

and

are the mean and variance respectively of the process

Figure shows two Laplacian PDFs Error values in linear prediction have

been observed to display Laplacian PDFs

FIGURE

Two Laplacian PDFs with

solid and

dashed

Rayleigh The Rayleigh PDF is given by the function

x a exp

x a

ux a

where ux is the unit step function such that

if x

otherwise

The mean and variance of the Rayleigh PDF are determined by the parameters

a and b as

b and

Figure shows a Rayleigh PDF with a and b The Rayleigh

PDF has been used to model speckle noise

FIGURE

Rayleigh PDF with a and b

Structured noise

Powerline interference at Hz or Hz is a common artifact in many

biomedical signals The typical waveform of the interference is known in

advance and hence it could be considered to be a form of structured noise

It should however be noted that the phase of the interfering waveform will not

usually be known Furthermore the interfering waveform may not be an exact

sinusoid this is indicated by the presence of harmonics of the fundamental

Hz or Hz component Notch and comb lters may be used to remove

powerline artifact It is not common to encounter powerline interference

in biomedical images

A common form of structured noise in biomedical images is the grid arti

fact On rare occasions the grid used in Xray imaging may not translate

or oscillate as designed the stationary grid then casts its shadow which is

superimposed upon the image of the subject Stationary grids with parallel

strips create an artifact that is a set of parallel lines which is also periodic

Depending upon the strip width strip orientation and sampling resolution

the artifactual lines may appear as thin straight lines or as relatively thick

rectangular stripes with some internal grayscale variation see Figure

and Section

In some imaging applications grids and frames may be used to position

the object or subject being imaged The image of the grid or frame could

serve a useful purpose in image registration and calibration as well as in the

modeling and removal of geometric distortion The same patterns may be

considered to be artifacts in other applications of image processing

Labels indicating patient identication patient positioning imaging pa

rameters and other details are routinely used in Xray imaging Ideally the

image cast by such labels should be well removed from the image of the pa

tient on the acquired image although occasionally they may get connected

or even overlap Such items interfere in image analysis when the procedure

is applied to the entire image Preprocessing techniques are then required to

recognize and remove such artifacts see Section for examples

Surgical implants such as staples pins and screws create diculties and

artifacts in Xray MR CT and ultrasound imaging The advantage with

such artifacts is that the precise composition and geometry of the implants

are known by design and the manufacturers specications Methods may

then be designed to remove each specic artifact

Physiological interference

As we have already noted the human body is a complex conglomeration of

several systems and processes Several physiological processes could be active

at a given instant of time each one aecting the system or process of interest

in diverse ways A patient or experimental subject may not be able to exercise

control on all of his or her physiological processes and systems The eect of

systems or processes other than those of interest on the image being acquired

may be termed as physiological interference several examples are listed below

Eect of breathing on a chest Xray image

Eect of breathing peristalsis and movement of material through the

gastrointestinal system on CT images of the abdomen

Eect of cardiovascular activity on CT images of the chest

Eect of pulsatile movement of arteries in subtraction angiography

Physiological interference may not be characterized by any specic wave

form pattern or spectral content and is typically dynamic and nonstationary

varying with the level of the activity of relevance and hence with time see

Section for a discussion on stationarity Thus simple linear bandpass

lters will usually not be eective in removing physiological interference

Normal anatomical details such as the ribs in chest Xray images and the

skull in brain imaging may also be considered to be artifacts when other details

in such images are of primary interest Methods may need to be developed to

remove their eects before the details of interest may be analyzed

Other types of noise and artifact

Systematic errors are caused by several factors such as geometric distor

tion miscalibration nonlinear response of detectors sampling and quantiza

tion Such errors may be modeled from a knowledge of the corresponding

parameters which may be determined from specications measured experi

mentally or derived mathematically

A few other types of artifact that cannot be easily categorized into the

groups discussed above are the following

Punctate or shot noise due to dust on the screen lm or examination

table

Scratches on lm that could appear as intense line segments

Shot noise due to inactive elements in a detector array

Saltandpepper noise due to impulsive noise leading to black or white

pixels at the extreme ends of the pixelvalue range

Filmgrain noise due to scanning of lms with high resolution

Punctate noise in chest Xray or mammographic images caused by cos

metic powder or deodorant which could masquerade as microcalcica

tions

Superimposed images of clothing accessories such as pins hooks but

tons and jewelry

Stationary versus nonstationary processes

Random processes may be characterized in terms of their temporalspatial

andor ensemble statistics A random process is said to be stationary in the

strict sense or strongly stationary if its statistics are not aected by a shift in

the origin of time or space In most practical applications only the rstorder

and secondorder averages are used A random process is said to be weakly

stationary or stationary in the wide sense if its mean is a constant and its

ACF depends only upon the dierence or shift in time or space Then we

have

and

The ACF is

now a function of the shift parameters and only the PSD of the process

does not vary with space

A stationary process is said to be ergodic if the temporal statistics computed

are independent of the sample observed that is the same results are obtained

with any sample observation f

t The time averages of the process are then

independent of k

and

All ensemble statis

tics may be replaced by temporal statistics when analyzing ergodic processes

Ergodic processes are an important type of stationary random processes be

cause their statistics may be computed from a single observation as a function

of time The same concept may be extended to functions of space as well

although the term ergodic is not commonly applied to spatial functions

Signals or processes that do not meet the conditions described above may in

general be called nonstationary processes A nonstationary process possesses

statistics that vary with time or space The statistics of most images vary over

space indeed such variations are the source of pictorial information Most

biomedical systems are dynamic systems and produce nonstationary signals

and images However a physical or physiological system has limitations in the

rate at which it can change its characteristics This limitation facilitates the

breaking of a signal into segments of short duration typically a few tens of

milliseconds over which the statistics of interest may be assumed to remain

constant The signal is then referred to as a quasistationary process Tech

niques designed for stationary signals may then be extended and applied to

nonstationary signals Analysis of signals by this approach is known as short

time analysis On the same token the characteristics of the features

in an image vary over relatively large scales of space statistical parameters

within small regions of space within an object or within an organ of a given

type may be assumed to remain constant The image may then be assumed

to be blockwise stationary which permits sectioned or blockbyblock pro

cessing or movingwindow processing using techniques designed for stationary

processes Figure illustrates the notion of computing statistics

within a moving window

Certain systems such as the cardiac system normally perform rhythmic op

erations Considering the dynamics of the cardiac system it is obvious that

the system is nonstationary However various phases of the cardiac cycle

as well as the related components of the associated electrocardiogram ECG

phonocardiogram PCG and carotid pulse signals repeat over time in

an almostperiodic manner A given phase of the process or signal possesses

statistics that vary from those of the other phases however the statistics

of a specic phase repeat cyclically For example the statistics of the PCG

signal vary within the duration of a cardiac cycle especially when murmurs

are present but repeat themselves at regular intervals over successive cardiac

cycles Such signals are referred to as cyclostationary signals The cycli

cal repetition of the process facilitates synchronized ensemble averaging see

Sections and using epochs or events extracted from an observation

of the signal over many cycles

The cyclical nature of cardiac activity may be exploited for synchronized

averaging to reduce noise and improve the SNR of the ECG and PCG The

FIGURE

Blockbyblock processing of an image Statistics computed by using the pixels

within the window shown with solid lines pixels are applicable to the

pixel marked with the symbol Statistics for use when processing the pixel

marked with the ! symbol pixels are computed by using the pixels

within the window shown with dashed lines

same technique may also be extended to imaging the heart In gated blood

pool imaging nuclear medicine images of the heart are acquired in several

parts over short intervals of time Images acquired at the same phases of the

cardiac cycle determined by using the ECG signal as a reference trigger or

gating signal are accumulated over several cardiac cycles A sequence of

such gated and averaged frames over a full cardiac cycle may then be played

as a video or a movie to visualize the timevarying size and contents of the

left ventricle See Section for illustration of gated bloodpool imaging

Covariance and crosscorrelation

When two random processes f and g need to be compared we could compute

the covariance between them as

E f

f g df dg

where p

f g is the joint PDF of the two processes and the image coor

dinates have been omitted for the sake of compact notation The covariance

parameter may be normalized to get the correlation coecient dened as

with

A high covariance indicates that the two processes have

similar statistical variability or behavior The processes f and g are said to

be uncorrelated if

Two processes that are statistically independent

are also uncorrelated the converse of this property is in general not true

When dealing with random processes f and g that are functions of space

the crosscorrelation function CCF between them is dened as

E fx y gx y

In situations where the PDF is not available the expressions above may be

approximated by the corresponding ensemble or spatial statistics Correlation

functions are useful in analyzing the nature of variability and spectral band

width of images as well as for the detection of objects by template matching

Signaldependent noise

Noise may be categorized as being independent of the signal of interest if no

statistical parameter of any order of the noise process is a function of the

signal Although it is common to assume that the noise present in a signal

or image is statistically independent of the true signal or image of interest

several cases exist in biomedical imaging where this assumption is not valid

and the noise is functionally related to or dependent upon the signal The

following paragraphs provide brief notes on a few types of signaldependent

noise encountered in biomedical imaging

Poisson noise Imaging systems that operate in lowlight conditions or

in lowdose radiation conditions such as nuclear medicine imaging are often

aected by photon noise that can be modeled as a Poisson process

see Section The probabilistic description of an observed image pixel

value g

mn under the conditions of a Poisson process is given by

P g

mnjfmn

fmn

exp fmn

where fmn is the undegraded pixel value the observation in the absence

of any noise and is a proportionality factor Because the mean of the

degraded image g

is given by

E g

mn E fmn

images corrupted with Poisson noise are usually normalized as

gmn

It has been shown that in this case Poisson noise may be modeled as

stationary noise uncorrelated with the signal and added to the signal as in

Equation with zero mean and variance given by

E fmn

E gmn

Filmgrain noise The granular structure of lm due to the silverhalide

grains used contributes noise to the recorded image which is known as lm

grain noise When images recorded on photographic lm are digitized in order

to be processed by a digital computer lmgrain noise is a signicant source

of degradation of the information According to Froehlich et al the

model for an image corrupted by lmgrain noise is given by

gmn fmn F fmn

where is a proportionality factor F is a mathematical function and

mn and

mn are samples from two random processes independent of

the signal This model may be taken to represent a general imaging situation

that includes signalindependent noise as well as signaldependent noise and

the noise could be additive or multiplicative Observe that the model reduces

to the simple signalindependent additive noise model in Equation if

Froehlich et al modeled lmgrain noise with F fmn fmn

using p The two noise processes

and

were assumed to be Gaussian

distributed uncorrelated zeromean random processes According to this

model the noise that corrupts the image has two components one that is

signaldependent through the factor

fmn

mn and another that

is signalindependent given by

mn Filmgrain noise may be modeled

as additive noise as in Equation with mn

fmn

mn It can be shown that mn as above is stationary has zero mean

and has its variance given by

E gmn

The fact that the mean of the corrupted image equals the mean of the noise

free image has been used in arriving at the relationship above

Speckle noise Speckle noise corrupts images that are obtained by coher

ent radiation such as syntheticaperture radar SAR ultrasound laser and

sonar When an object being scanned by a laser beam has random surface

roughness with details of the order of the wavelength of the laser the imaging

system will be unable to resolve the details of the objects roughness this

results in speckle noise The most widely used model for speckle noise is a

multiplicative model given as

gmn fmn

where

mn is a stationary noise process that is assumed to be uncorrelated

with the image If the mean of the noise process

is not equal to one the

noisy image may be normalized by dividing by

such that in the normalized

image the multiplicative noise has its mean equal to one Depending upon

the specic application the distribution of the noise may be assumed to be

exponential Gaussian or Rayleigh

The multiplicative model in Equation may be converted to the additive

model as in Equation with mn being zeromean additive noise having

a spacevariant signaldependent variance given by

In the expression above

mn and

mn are the variance and the mean

of the noisy image at the point mn respectively

Transformation of signaldependent noise to signalindependent

noise In the model used by Naderi and Sawchuk and Arsenault et al

the signalindependent component of the noise as in Equation

is assumed to be zero In this case it has been shown that by

applying an appropriate transformation to the whole image the noise can be

made signalindependent One of the transformations proposed is

T gmn

gmn

where is an appropriate normalizing constant It has been shown that the

noise in the transformed image is additive has a Gaussian distribution is

unbiased and has a standard deviation that no longer depends on the signal

but is given by

Synchronized or Multiframe Averaging

In certain applications of imaging if the object being imaged can remain

free from motion or change of any kind internal or external over a long

period of time compared to the time required to record an image it becomes

possible to acquire several frames of images of the object in precisely the same

state or condition Then the frames may be averaged to reduce noise this is

known as multiframe averaging The method may be extended to the imaging

of dynamic systems whose movements follow a rhythm or cycle with phases

that can be determined by another signal such as the cardiac system whose

phases of contraction are indicated by the ECG signal Then several image

frames may be acquired at the same phase of the rhythmic movement over

successive cycles and averaged to reduce noise Such a process is known as

synchronized averaging The process may be repeated or triggered at every

phase of interest Note A process as above in nuclear medicine imaging may

be viewed simply as counting the photons emitted over a long period of time

in total albeit in a succession of short intervals gated to a particular phase

of contraction of the heart Ignoring the last step of division by the number

of frames to obtain the average the process simply accumulates the photon

counts over the frames acquired

Synchronized averaging is a useful technique in the acquisition of several

biomedical signals Observe that averaging as above is a form of ensemble

averaging

Let us represent a single image frame in a situation as above as

x y fx y

x y

where g

x y is the i

observed frame of the image fx y and

x y is the

noise in the same frame Let us assume that the noise process is independent

of the signal source Observe that the desired original image fx y is

invariant from one frame to another It follows that

that

is the variance at every pixel in the observed noisy image is equal to the

corresponding variance of the noise process

If M frames of the image are acquired and averaged the averaged image is

given by

gx y

x y

If the mean of the noise process is zero we have

x y asM

in practice as the number of frames averaged increases to a large number

Then it follows that

E gx y fx y

and

gxy

Thus the variance at every pixel in the averaged image is reduced by a factor

from that in a single frame the SNR is improved by the factor

The most important requirement in this procedure is that the frames be

ing averaged be mutually synchronized aligned or registered Any motion

change or displacement between the frames will lead to smearing and distor

tion

Example Figure a shows a test image with several geometrical

objects placed at random Images b and c show two examples of eight noisy

frames of the test image that were obtained by adding Gaussiandistributed

random noise samples The results of averaging two four and eight noisy

frames including the two in b and c are shown in parts d e and

f respectively It is seen that averaging using increasing numbers of frames

of the noisy image leads to a reduction of the noise the decreasing trend in

the RMS values of the processed images given in the caption of Figure

conrms the expected eect of averaging

See Section for an illustration of the application of multiframe averaging

in confocal microscopy See also Section for details on gated bloodpool

imaging in nuclear medicine

a b

c d

e f

FIGURE

a Shapes a test image with various geometrical objects placed

at random b Image in a with Gaussian noise added with

normalized RMS error c Second version of noisy image RMS

error Result of multiframe averaging using d the two frames in b

and c RMS error e four frames RMS error f eight

frames RMS error

Spacedomain Localstatisticsbased Filters

Consider the practical situation when we are given a single noisy observation

of an image of nite size We do not have access to an ensemble of images to

perform multiframe synchronized averaging and spatial statistics computed

over the entire image frame will lead to scalar values that do not assist in re

moving the noise and obtaining a cleaner image Furthermore we should also

accommodate for nonstationarity of the image In such situations moving

window ltering using windows of small size such as or

pixels becomes a valuable option rectangular windows as well as windows of

other shapes may also be considered where appropriate Various statistical

parameters of the pixels within such a moving window may be computed with

the result being applied to the pixel in the output image at the same location

where the window is placed centered on the input image see Figure

Observe that only the pixel values in the given input image are used in

the ltering process the output is stored in a separate array Figure

illustrates a few dierent neighborhood shapes that are commonly used in

movingwindow image ltering

FIGURE

Movingwindow ltering of an image The size of the moving window in the

illustration is pixels Statistics computed by using the pixels within the

window are applied to the pixel at the same location in the output image

The moving window is shown for two pixel locations marked ! and

FIGURE

A few commonly used movingwindow neighborhood shapes for image lter

ing The result computed by using the pixels within a window is applied to

the pixel at the location of its center shown shaded in the output image

The mean lter

If we were to select the pixels in a small neighborhood around the pixel to be

processed the following assumptions may be made

the image component is relatively constant that is the image is quasis

tationary and

the only variations in the neighborhood are due to noise

Further assumptions regarding the noise process that are typically made are

that it is additive is independent of the image and has zero mean Then if we

were to take the mean of the pixels in the neighborhood the result will tend

toward the true pixel value in the original uncorrupted image In essence

a spatial collection of pixels around the pixel being processed is substituted

for an ensemble of pixels at the same location from multiple frames in the

averaging process that is the imagegenerating process is assumed to be

ergodic

It is common to use a or connected neighborhood as in Figure

a for mean ltering Then the output of the lter gmn is given by

gmn

fm n

where fmn is the input image The summation above may be expanded

gmn

fm n fm n fm n

fmn fmn fmn

fm n fm n fm n

The same result is also achieved via convolution of the image fmn with

the array or mask

Note that the operation above cannot be directly applied at the edges of

the input image array it is common to extend the input array with a border

of zerovalued pixels to permit ltering of the pixels at the edges One may

also elect not to process the pixels at the edges or to replace them with the

average of the available neighbors

The mean lter can suppress Gaussian and uniformly distributed noise ef

fectively in relatively homogeneous areas of an image However the operation

leads to blurring at the edges of the objects in the image and also to the loss

of ne details and texture Regardless mean ltering is commonly employed

to remove noise and smooth images The blurring of edges may be prevented

to some extent by not applying the mean lter if the dierence between the

pixel being processed and the mean of its neighbors is greater than a certain

threshold this condition however makes the lter nonlinear

The median lter

The median of a collection of samples is the value that splits the popula

tion in half half the number of pixels in the collection will have values less

than the median and half will have values greater than the median In small

populations of pixels under the constraint that the result be an integer ap

proximations will have to be made the most common procedure rankorders

the pixels in a neighborhood containing an odd number of pixels and the

pixel value at the middle of the list is selected as the median The procedure

also permits the application of orderstatistic lters the i

element in

a rankordered list of values is known as the i

order statistic The median

lter is an orderstatistic lter of order N where N is the size of the lter

that is the number of values used to derive the output

The median lter is a nonlinear lter Its success in ltering depends upon

the number of the samples used to derive the output as well as the spatial

conguration of the neighborhood used to select the samples

The median lter provides better noise removal than the mean lter without

blurring especially when the noise has a longtailed PDF resulting in outliers

and in the case of saltandpepper noise However the median lter could

result in the clipping of corners and distortion of the shape of sharpedged

objects median ltering with large neighborhoods could also result in the

complete elimination of small objects Neighborhoods that are not square

in shape are often used for median ltering in order to limit the clipping of

corners and other types of distortion of shape see Figure

Examples Figure a shows a D test signal with a rectangular pulse

part b of the same gure shows the test signal degraded with impulse shot

noise The results of ltering the noisy signal using the mean and median with

lter length N are shown in plots c and d respectively of Figure

The mean lter has blurred the edges of the pulse it has also created artifacts

in the form of small hills and valleys The median lter has removed the noise

without distorting the signal

Figure a shows a D test signal with two rectangular pulses part b

of the same gure shows the test signal degraded with uniformly distributed

noise The results of ltering the noisy signal using the mean and median with

lter length N are shown in plots c and d respectively of Figure

The mean lter has reduced the noise level but has also blurred the edges of

the pulses in addition the strength of the rst short pulse has been reduced

The median lter has removed the noise to some extent without distorting

the edges of the long pulse however the short pulse has been obliterated

FIGURE

a A D test signal with a rectangular pulse b Degraded signal with

impulse or shot noise Result of ltering the degraded signal using c the

mean and d the median operation with a sliding window of N samples

FIGURE

a A D test signal with two rectangular pulses b Degraded signal with uni

formly distributed noise Result of ltering the degraded signal using c the

mean and d the median operation with a sliding window of N samples

Figure shows the original test image Shapes the test image degraded

by the addition of Gaussiandistributed random noise with and

normalized and the results of ltering the noisy image with the

and mean and median lters The RMS errors of the noisy and ltered

images with respect to the test image are given in the gure caption All of

the lters except the median have led to an increase in the RMS error

The blurring eect of the mean lter is readily seen in the results Close

observation of the result of median ltering Figure d shows that

the lter has resulted in distortion of the shapes in particular clipping of

the corners of the objects The median lter has led to the complete

removal of small objects see Figure f Observe that the results of the

mean and median lters have similar RMS error values however

the blurring eect in the former case and the distortion of shape information

as well as the loss of small objects in the latter case need to be considered

carefully

Figure gives a similar set of images with the noise being Poisson dis

tributed A comparable set with speckle noise is shown in Figure Al

though the lters have reduced the noise to some extent the distortions in

troduced have led to increased RMS errors for all of the results

Figures and show two cases with saltandpepper noise the density

of pixels aected by noise being and respectively The median

lter has given good results in both cases with the lowest RMS error and the

least distortion The median lter has led to signicant shape distortion

and the loss of a few small features

Figure shows the normalized histograms of the Shapes test image and

its degraded versions with Gaussian Poisson and speckle noise It is evident

that the signaldependent Poisson noise and speckle noise have aected the

histogram in a dierent manner compared to the signalindependent Gaussian

noise

Figure shows the results of ltering the Peppers test image aected

by Gaussian noise Although the RMS errors of the ltered images are low

compared to that of the noisy image the lters have introduced a mottled

appearance and ne texture in the smooth regions of the original image Fig

ure shows the case with Poisson noise where the lters have provided

visually good results regardless of the RMS errors

All of the lters have performed reasonably well in the presence of speckle

noise as illustrated in Figure in terms of the reduction of RMS error

However the visual quality of the images is poor

Figures and show that the median lter has provided good results

in ltering saltandpepper noise Although the RMS values of the results

of the mean lters are lower than those of the noisy images visual inspec

tion of the results indicates the undesirable eects of blurring and mottled

appearance

The RMS error or the MSE is commonly used to compare the results of

various image processing operations however the examples presented above

illustrate the limitations in using the RMS error in comparing images with

dierent types of artifact and distortion In some of the results shown an im

age with a higher RMS error may present better visual quality than another

image with a lower RMS error Visual inspection and analysis of the results

by qualied users or experts in the domain of application is important It is

also important to test the proposed methods with phantoms or test images

that demonstrate the characteristics that are relevant to the specic applica

tion being considered Assessment of the advantages provided by the ltered

results in further processing and analysis such as the eects on diagnostic

accuracy in the case of medical images is another approach to evaluate the

results of ltering

Orderstatistic lters

The class of orderstatistic lters is large and includes several nonlinear

lters that are useful in ltering dierent types of noise in images The rst

step in orderstatistic ltering is to rankorder from the minimum to the

maximum the pixel values in an appropriate neighborhood of the pixel being

processed The i

entry in the list is the output of the i

orderstatistic

lter A few orderstatistic lters of particular interest are the following

Min lter the rst entry in the rankordered list useful in removing

highvalued impulse noise isolated bright spots or salt noise

Max lter the last entry in the rankordered list useful in removing

lowvalued impulse noise isolated dark spots or pepper noise

MinMax lter sequential application of the Min and Max lters useful

in removing saltandpepper noise

Median lter the entry in the middle of the list The median lter is

the most popular and commonly used lter among the orderstatistic

lters see Section for detailed discussion and illustration of the

median lter

trimmed mean lter the mean of a reduced list where the rst and

the last of the list is rejected with Outliers that is

pixels with values very dierent from the rest of the pixels in the list

are rejected by the trimming process A value close to for rejects

the entire list except the median or a few values close to it and the

output is close to or equal to that of the median lter The mean of

the trimmed list provides a compromise between the generic mean and

median lters

Llters a weighted combination of all of the elements in the rank

ordered list The use of appropriate weights can provide outputs equal

a b

c d

e f

FIGURE

a Shapes test image b Image in a with Gaussian noise added with

normalized RMS error Result of ltering the

noisy image in b using c mean RMS error d median

RMS error e mean RMS error f median

RMS error

a b

c d

e f

FIGURE

a Shapes test image b Image in a with Poisson noise RMS error

Result of ltering the noisy image in b using c mean RMS error

d median RMS error e mean RMS error

f median RMS error

a b

c d

e f

FIGURE

a Shapes test image b Image in a with speckle noise with

normalized RMS error Result of ltering the noisy image in

b using c mean RMS error d median RMS error

e mean RMS error f median RMS error

a b

c d

e f

FIGURE

a Shapes test image b Image in a with saltandpepper noise added

with density RMS error Result of ltering the noisy image

in b using c mean RMS error d median RMS error

e mean RMS error f median RMS error

a b

c d

e f

FIGURE

a Shapes test image b Image in a with saltandpepper noise added

with density RMS error Result of ltering the noisy image in

b using c mean RMS error d median RMS error

e mean RMS error f median RMS error

a b

c d

FIGURE

Normalized histograms of a the Shapes test image and of the image with

b Gaussian noise c Poisson noise and d speckle noise The rst his

togram has been scaled to display the range of probability only the

remaining histograms have been scaled to display the range only

in order to show the important details The probability values of gray lev

els and have been clipped in some of the histograms Each histogram

represents the graylevel range of

a b

c d

e f

FIGURE

a Peppers a test image b Image in a with Gaussian noise added

with

normalized RMS error Result of ltering the

noisy image in b using c mean RMS error d median

RMS error e mean RMS error f median RMS

error

a b

c d

e f

FIGURE

a Peppers test image b Image in a with Poisson noise RMS error

Result of ltering the noisy image in b using c mean RMS

error d median RMS error e mean RMS error

f median RMS error

a b

c d

e f

FIGURE

a Peppers test image b Image in a with speckle noise with

normalized RMS error Result of ltering the noisy image in

b using c mean RMS error d median RMS error

e mean RMS error f median RMS error

a b

c d

e f

FIGURE

a Peppers test image b Image in a with saltandpepper noise added

with density RMS error Result of ltering the noisy image

in b using c mean RMS error d median RMS error

e mean RMS error f median RMS error

a b

c d

e f

FIGURE

a Peppers test image b Image in a with saltandpepper noise added

with density RMS error Result of ltering the noisy image in

b using c mean RMS error d median RMS error

e mean RMS error f median RMS error

to those of all of the lters listed above and facilitate the design of

several orderstatisticbased nonlinear lters

Orderstatistic lters represent a family of nonlinear lters that have gained

popularity in image processing due to their characteristics of removing several

types of noise without blurring edges and due to their simple implementation

Frequencydomain Filters

Transforming an image from the space domain to the frequency domain using

a transform such as the Fourier transform provides advantages in ltering and

noise removal Most images of natural beings entities and scenes vary slowly

and smoothly across space and are usually devoid of steplike changes As

a consequence such images have most of their energy concentrated in small

regions around u v in their spectra On the other hand uncor

related random noise elds have a uniform at or white spectrum with

an almostconstant energy level across the entire frequency space This leads

to the common observation that the SNR of a noisy natural image is higher

in lowfrequency regions than in highfrequency regions It becomes evident

then that such images may be improved in appearance by suppressing or re

moving their highfrequency components beyond a certain cuto frequency

It should be recognized however that in removing highfrequency compo

nents along with the noise components some desired image components will

also be sacriced Furthermore noise components in the lowfrequency pass

band will continue to remain in the image

The procedure for Fourierdomain ltering of an image fmn involves the

following steps

Compute the D Fourier transform F k l of the image This may

require padding the image with zeros to increase its size to an N N

array with N being an integral power of if an FFT algorithm is to

be used

Design or select an appropriate D lter transfer function Hk l

Obtain the ltered image in the Fourier domain as

Gk l Hk lF k l

It is common to dene Hk l as a real function thereby aecting only

the magnitude of the input image spectrum the phase remains un

changed Depending upon the denition or computation of Hk l the

spectrum F k l may have to be centered or folded see Figures

and

Compute the inverse Fourier transform of Gk l If F k l was folded

prior to ltering it must be unfolded prior to the inverse transformation

If the input image was zeropadded trim the resulting image gmn

Although the discussion above concentrates on the removal of noise it

should be noted that frequencydomain ltering permits the removal of sev

eral types of artifacts as well as the selection of the desired frequency com

ponents in the image in various manners The transfer function Hk l may

be specied in several ways in the frequency domain phase corrections may

be introduced in a separate step It should be recognized that while dealing

with realvalued images Hk l should maintain conjugate symmetry in the

frequency plane it is common to use isotropic lter functions

Removal of highfrequency noise

Lowpass lters are useful in removing highfrequency noise under the as

sumption that the noise is additive and that the Fourier components of the

original image past a certain frequency cuto are negligible The socalled

ideal lowpass lter is dened in the D Fourier space as

Hu v

ifDu v D

otherwise

where Du v

is the distance of the frequency component at

u v from the DC point u v with the spectrum being positioned

such that the DC is at its center see Figures and Note

The coordinates u v and k l are used interchangeably to represent the D

frequency space D

is the cuto frequency beyond which all components

of the Fourier transform of the given image are set to zero Figure a

shows the ideal lowpass lter function Figure shows proles of the ideal

and Butterworth lowpass lters

Example Figures a and b show the Shapes test image and a noisy

version with Gaussian noise added Figure shows the Fourier magnitude

spectra of the original and noisy versions of the Shapes image The sharp

edges in the image have resulted in signicant highfrequency energy in the

spectrum of the image shown in Figure a in addition the presence of

strong features at certain angles in the image has resulted in the concentration

of energy in the corresponding angular bands in the spectrum The spectrum

of the noisy image shown in Figure b demonstrates the presence of

additional and increased highfrequency components

Figure c shows the noisy image ltered using an ideal lowpass lter

with the normalized cuto D

times the highest frequency along the

u or v axis see Figure a A glaring artifact is readily seen in the

result of the lter while the noise has been reduced faint echoes of the

edges present in the image have appeared in the result This is due to the

a b

FIGURE

a The magnitude transfer function of an ideal lowpass lter The cuto

frequency D

is times the maximum frequency that is times the

sampling frequency b The magnitude transfer function of a Butterworth

lowpass lter with normalized cuto D

and order n The u v

point is at the center The gain is proportional to the brightness white

represents and black represents

FIGURE

Proles of the magnitude transfer functions of an ideal lowpass lter solid

line and a Butterworth lowpass lter dashed line with normalized cuto

and order n

fact that the inverse Fourier transform of the circular ideal lter dened in

Equation is a Bessel function see Figure for the circle " Bessel

Fourier transform pair Multiplication of the Fourier transform of the image

with the circle function is equivalent to convolution of the image in the space

domain with the corresponding Bessel function The ripples or lobes of the

Bessel function lead to echoes of strong edges an artifact known as the ringing

artifact The example illustrates that the ideal lters abrupt transition

from the passband to the stopband is after all not a desirable characteristic

The Butterworth lowpass lter Prevention of the ringing artifacts

encountered with the ideal lowpass lter requires that the transition from the

passband to the stopband and viceversa in the case of highpass lters be

smooth The Butterworth lter is a commonly used frequencydomain lter

due to its simplicity of design and the property of a maximally at magnitude

response in the passband For a D Butterworth lowpass lter of order n the

rst n derivatives of the squared magnitude response are zero at

where represents the radian frequency The Butterworth lter response is

monotonic in the passband as well as in the stopband See Rangayyan

for details and illustrations of D the Butterworth lter

In D the Butterworth lowpass lter is dened as

Hu v

Duv

where n is the order of the lter Du v

and D

is the half

power D radial cuto frequency the scale factor in the denominator leads to

the gain of the lter being

at Du v D

The lters transition from

the passband to the stopband becomes steeper as the order n is increased

Figures b and illustrate the magnitude gain of the Butterworth

lowpass lter with the normalized cuto D

and order n

Example The result of ltering the noisy Shapes image in Figure b

with the Butterworth lowpass lter as above is shown in Figure d It

is seen that noise has been suppressed in the ltered image without causing

the ringing artifact however the lter has caused some blurring of the edges

of the objects in the image

Example Figure a shows a test image of a clock The image was

contaminated by a signicant amount of noise suspected to be due to poor

shielding of the videosignal cable between the camera and the digitizing frame

buer Part b of the gure shows the logmagnitude spectrum of the image

The sinc components due to the horizontal and vertical lines in the image are

readily seen on the axes of the spectrum Radial concentrations of energy are

also seen in the spectrum related to the directional or oriented components

of the image Part c of the gure shows the result of application of the

ideal lowpass lter with cuto D

The strong ringing artifacts caused

by the lter render the image useless although some noise reduction has

a b

c d

FIGURE

a The Shapes test image b The test image with Gaussian noise having

a normalized variance of added c The result of ideal lowpass ltering

the noisy image with normalized cuto D

see Figure d The

result of ltering with a Butterworth lowpass lter having D

and order

n See also Figure

a b

FIGURE

The centered folded Fourier logmagnitude spectrum of a the Shapes im

ages in Figure a and b the noisy Shapes image in Figure b

been achieved Part d of the gure shows the result of ltering using a

Butterworth lowpass lter with D

and order n The noise in the

image has been suppressed well without causing any artifact except blurring

or smoothing

See Section for illustration of the application of frequencydomain low

pass lters to an image obtained with a confocal microscope

Removal of periodic artifacts

Periodic components in images give rise to impulselike and periodic concen

trations of energy in their Fourier spectra This characteristic facilitates the

removal of periodic artifacts through selective bandreject notch or comb

ltering

Example Figure a shows a part of an image of a mammographic

phantom acquired with the grid bucky remaining xed The white objects

simulate calcications found in mammograms The projections of the grid

strips have suppressed the details in the phantom Figure b shows the

logmagnitude Fourier spectrum of the image where the periodic concentra

tions of energy along the v axis as well as along the corresponding horizontal

strips are related to the grid lines Figure d shows the spectrum with

selected regions corresponding to the artifactual components set to zero Fig

ure c shows the corresponding ltered image where the grid lines have

been almost completely removed

a b

c d

FIGURE

a Clock test image pixels b Logmagnitude spectrum of the

image c Result of the ideal lowpass lter D

d Result of the

Butterworth lowpass lter with D

and order n

a b

c d

FIGURE

a Part of an image of a mammographic phantom with grid artifact see

also Figure b Logmagnitude Fourier spectrum of the image in a

c Filtered image d Filtered version of the spectrum in b Phantom

image courtesy of LJ Hahn Foothills Hospital Calgary

Figure a shows a corresponding image of the phantom acquired with

the bucky moving in the recommended manner the image is free of the grid

line artifact Figure b shows the corresponding logmagnitude Fourier

spectrum which is also free of the artifactual components that are seen in the

spectrum in Figure b It should be noted that removing the artifactual

components as indicated by the spectrum in Figure d leads to the loss

of the frequencydomain components of the desired image in the same regions

which could lead to some distortion in the ltered image

a b

FIGURE

a Part of an image of a mammographic phantom with no grid artifact com

pare with the image in Figure a b Logmagnitude Fourier spectrum

of the image in a Phantom image courtesy of LJ Hahn Foothills Hospital

Calgary

Matrix Representation of Image Processing

We have thus far expressed images as D arrays and transform and image

processing procedures with operations such as addition multiplication inte

gration and convolution of arrays Such expressions dene the images and

the related entities on a pointbypoint pixelbypixel or singleelement ba

sis The design of optimal lters and statistical estimation procedures require

the application of operators such as dierentiation and statistical expectation

to expressions involving images and image processing operations The array

or singleelement form of representation of images and operations does not

lend easily to such procedures It would be convenient if we could represent

a whole image as a single algebraic entity and if image processing operations

could be expressed using basic algebraic operations In this section we shall

see that matrix representation of images convolution lters and transforms

facilitates ecient and compact expression of image processing optimization

and estimation procedures

Matrix representation of images

A sampled image may be represented by a matrix as

f ffmn m M n N g

The matrix has M rows each with N elements the matrix has N columns

Matrix methods may then be used in the analysis of images and in the deriva

tion of optimal lters See Section for a discussion on array and matrix

representation of images

In treating images as matrices and mathematical entities it should be recog

nized always that images are not merely arrays of numbers certain constraints

are imposed on the image matrix due to the physical properties of the image

Some of the constraints to be noted are

Nonnegativity and upper bound f

min

fmn f

max

where f

min

and f

max

are the minimum and maximum values respectively imposed

by the characteristics of the object being represented by the image as

well as by the image quantization process Usually f

min

and the

pixel values are constrained to be nonnegative

Finite energy E

mn E

max

where E

max

is a

nite limit on the total energy of the image

Smoothness Most images that represent reallife entities cannot change

their characteristics abruptly The dierence between a pixel and the

average of its immediate neighbors is usually bounded by a limit S as

fmn

fm n fm n fm n

fmn fmn

fm n fm n fm n

An M N matrix may be converted to a vector by row ordering as

FIGURE

a Matrix representation of a image Vector representation of the image

in a by b row ordering and c column ordering or stacking

where f

fm fm fmN

is the m

row vector Column

ordering may also be performed Figure illustrates the conversion of an

image to a vector in schematic form

Using the vector notation we get the energy of the image as

E f

f f f

which is the inner product or dot product of the vector with itself The energy

of the image may also be computed using the outer product as

E Tr f f

where Tr represents the trace the sum of the main diagonal elements of

the resulting MN MN matrix

If the image elements are considered to be random variables images may

be treated as samples of stochastic processes and characterized by their sta

tistical properties as follows

Mean f E f which is an MN matrix or vector

Covariance E f ff f

E f f

f f

which is an

MN MN matrix given by

where P MN and the matrix elements are

E ffp fpgffq fqg p q P

representing the covariance between the p

and q

elements of the

image vector is symmetric

The diagonal terms

are

the variances of the elements of the image vector

Autocorrelation or scatter matrix E f f

which is an MN MN

matrix The normalized autocorrelation coecients are dened as

Then

The normalized autocorrelation

matrix is given by

The absolute scale of variation is retained in the diagonal standard de

viation matrix

Then D D

Two image vectors f and g are

Uncorrelated if E f g

E f E g

Then the crosscovariance ma

trix

is a diagonal matrix and the crosscorrelation

I the

identity matrix

Orthogonal if E f g

Statistically independent if pf g pf pg Then f and g are

uncorrelated

The representation of image and noise processes as above facilitates the

application of optimization techniques and the design of optimal lters

Matrix representation of transforms

D transforms In signal analysis it is often useful to represent a signal

ft over the interval t

to t

T by an expansion of the form

where the functions

t are mutually orthogonal that is

t dt

C if k l

if k l

The functions are said to be orthonormal if C

The coecients a

may be obtained as

t dt

k that is a

is the projection of ft on to

The set of functions f

tg is said to be complete or closed if there exists

no squareintegrable function ft for which

t dt k

that is the function ft is orthogonal to all of the members of the set f

If such a function exists it should be a member of the set in order for the set

to be closed or complete

When the set f

tg is complete it is said to be an orthogonal basis

and may be used for accurate representation of signals For example the

Fourier series representation of periodic signals is based upon the use of an

innite set of sine and cosine functions of frequencies that are integral multi

ples harmonics of the fundamental frequency of the given signal In order for

such a transformation to exist the functions ft and

t must be square

integrable

With a D sampled signal expressed as an N vector or column matrix

we may represent transforms using an N N matrix L as

F L f and f L

with L L

I The matrix operations are equivalent to

F k

Lk n fn and fn

n k F k

for k or n going N respectively For the DFT we need to dene

the matrix L with its elements given by Lk n exp

With

the notation W

exp

we have Lk n W

Then the following

representations of the DFT in array or series format and matrix multiplication

format are equivalent

F k

fn exp

k N

F N

However because of the periodicity of the exponential function we have

for any integer m note that W

Thus for

a given N there are only N distinct functionsW

k N Fig

ure illustrates the vectors or phasors representingW

This property of the exponential function reduces the W matrix to one with

only N distinct values leading to the relationship

F N

For N we get

where the subscript N to W has been suppressed in order to show the

structure of the matrix clearly

FIGURE

Vectors or phasors representing the N roots of unity or W

whereW

exp

Based upon a similar gure by Hall

D transforms The transformation of an N N image fmn m

N n N may be expressed in a generic repre

sentation as

F k l

fmn mn k l

fmn

F k l mn k l

where mn k l is the forward transform kernel and mn k l is the

inverse transform kernel The kernel is said to be separable if mn k l

m k

n l and symmetric in addition if

and

are functionally

equal Then the D transform may be computed in two simpler steps of D

row transforms followed by D column transforms or vice versa as follows

m l

fmn n l m l N

F k l

m l m k k l N

In the case of the D Fourier transform we have the kernel

mn k l exp

mk nl

exp

The kernel is separable and symmetric The D DFT may be expressed as

F W f W

where f is the NN image matrix andW is a symmetric NN matrix with

exp

however due to the periodicity of the W

function

the matrix W has only N distinct values as shown in Equations and

The DFT matrixW is symmetric with its rows and columns being mutually

orthogonal

N k l

k l

Then W

which leads to

A number of transforms such as the Fourier Walsh"Hadamard and discrete

cosine may be expressed as F A f A with the matrix A constructed using

the relevant basis functions The transform matrices may be decomposed into

products of matrices with fewer nonzero elements reducing redundancy and

computational requirements The DFT matrix may be factored into a product

of lnN sparse and diagonal matrices leading to the FFT algorithm

The WalshHadamard Transform The orthonormal complete set of

DWalsh functions dened over the interval x is given by the iterative

relationships

x x

n odd

x x

n even

where

is the integral part of

and

The n

function

is generated by compression of the function

into

its rst half and

into its second half Figure shows the rst eight

Walsh functions sampled with samples over the interval obtained

using the denition in Equation Note The ordering of the Walsh

functions varies with the formulation of the functions

FIGURE

The rst eight Walsh functions sampled with samples over the interval

Walsh functions are ordered by the number of zerocrossings in the inter

val called sequency If the Walsh functions with the number of zero

crossings

are sampled with N

uniformly spaced points we

get a square matrix representation that is orthogonal and has its rows ordered

with increasing number of zerocrossings for N we have

Considering sequency ordering as above the functions for n and n

in Figure need to be swapped The Walsh transform of a D signal f

may then be expressed as F A f

Another formulation of the Walsh transform denes the forward and inverse

kernel function as

n k

for D signals and

mn k l

for D signals where b

m is the p

bit in the P bit binary representation

of m

The major advantage of the Walsh transform is that the kernel has integral

values of and only As a result the transform involves only addition and

subtraction of the input image pixels Furthermore the operations involved

in the forward and inverse transformation are identical

Except for the ordering of rows the discrete Walsh matrices are equivalent

to the Hadamard matrices of rank

which are constructed as follows

Then by dening A

the Walsh"Hadamard transform WHT of

a D function may be expressed as

F A f A and f A F A

where all matrices are of size N N

Figure shows the D Walsh"Hadamard basis functions matrices for

k l The functions were generated by using the D equivalents

of the recursive denition given for the D Walsh functions in Equation

Note that the ordering of the functions could vary from one denition to

another

FIGURE

The rst Walsh"Hadamard D basis functions Black represents a pixel

value of and white Each function was computed as an matrix

The computational and representational simplicity of the WHT is evident

from the expressions above The WHT has applications in image coding

image sequency ltering and feature extraction for pattern recognition

Matrix representation of convolution

With images represented as vectors linear system operations convolution

and ltering may be represented as matrixvector multiplications

For the sake of simplicity let us rst consider the D LSI system Let us

assume the system to be causal and to have an innite impulse response IIR

Then we have the inputoutput relationship given by the linear convolution

operation

f hn

If the input is given over N samples we could represent the output over the

interval N as

g h f

or in expanded form

h h

h h h

hN hN hN h

The matrix h is a Toeplitzlike matrix which leads to computational ad

vantages Note A Toeplitz matrix is a square matrix whose elements are

equal along every diagonal There will be zeros in the lowerleft portion of h

if hn has fewer samples than fn and gn h is then said to be banded

If the impulse response of the lter is of a nite duration of M samples

that is the lter is of the nite impulse response FIR type it is common to

use the noncausal movingwindow representation of convolution as

f hn

This relationship may also be expressed as g h f with the matrix and

vectors constructed as in Figure The matrix h is now banded and

Toeplitzlike Furthermore each row of h except the rst is a rightshifted

version of the preceding row It has been assumed that M N

Another representation of convolution is the periodic or circular form where

all of the signals are assumed to be of nite duration and periodic with

the period being equal to N samples The shifting required in convolution

is interpreted as periodic shifting samples that go out of the frame of N

samples at one end will reappear at the other end The periodic convolution

operation is expressed as

n mod N

B i o m e d i c a l I m a g e A n a l y s i s

h h

FIGURE

Construction of the matrix and vectors for convolution in the case of an FIR lter

where the subscript p indicates the periodic version or interpretation of the

signals Note that whereas the result of periodic convolution of two periodic

signals of period N samples each is another periodic signal of the same period

of N samples the result of their linear convolution would be of duration

N samples However periodic convolution may be used to achieve the

same result as linear convolution by padding the signals with zeros so as to

increase their duration to at least N samples and then considering the

extended signals to be periodic with the period of N samples It should

be observed that implementing convolution by taking the inverse DFT of the

product of the DFTs of the two signals will result in periodic convolution of

the signals zero padding as above will be necessary in such a case if linear

convolution is desired The operation performed by a causal stable LSI

system is linear convolution

Periodic convolution may also be expressed as g h f with the matrix

and vectors constructed as

h hN h h

h h h h

hN hN h hN

hN hN h h

The subscript p has been dropped for the sake of concise notation Each row

of the matrix h is a rightcircular shift of the previous row and the matrix is

square such a matrix is known as a circulant matrix

Illustrations of convolution

Convolution is an important operation in the processing of signals and images

We encounter convolution when ltering a signal or an image with an LSI

system the output is the convolution of the input with the impulse response

of the system

Convolution in D In D for causal signals and systems we have the

linear convolution operation given by

fk hn k

As expressed above the signal h needs to be reversed with respect to the

index of summation k and shifted by the interval n at which the output

sample is to be computed The index n may be run over a certain range

of interest or over all time for which the result gn exists For each value

of n the reversed and shifted version of h is multiplied with the signal f

on a pointbypoint basis and summed The multiplication and summation

operation together are comparable to the dot product operation performed

over the nonzero overlapping parts of the two signals

Example Consider the signal fn dened for n

Let the signal be processed by an LSI system with the impulse response hn

for n The signal and system are assumed to be causal that

is the values of fn and hn are zero for n furthermore it is assumed

that f and h are zero beyond the last sample provided

The operation of convolution is illustrated in Figure Observe the

reversal and shifting of h Observe also that the result g has more samples

or is of longer duration than either f or h the result of linear convolution

of two signals with N

and N

samples will have N

samples not

including trailing zerovalued samples

In the matrix notation of Equation the convolution example above is

expressed as

The result is identical to that shown in Figure

Convolution in D The output of an LSI imaging or image processing

system is given as the convolution of the input image with the PSF

gmn

f hm n

In this expression for the sake of generality the range of summation is allowed

to span the full spatial range of the resulting output image When the lter

PSF is of a much smaller spatial extent than the input image it becomes

convenient to locate the origin at the center of the PSF and use positive

and negative indices to represent the omnidirectional and noncausal nature

of the PSF Then D convolution may be expressed as

gmn

f hm n

where the size of the PSF is assumed to be odd given by M M

In this format D convolution may be interpreted as a mask operation per

formed on the input image the PSF is reversed ipped or reected about

both of its axes placed on top of the input image at the coordinate where the

output value is to be computed a pointbypoint multiplication is performed

of the overlapping areas of the two functions and the resulting products are

added The operation needs to be performed at every spatial location for

which the output exists by dragging and placing the reversed PSF at every

FIGURE

Illustration of the linear convolution of two D signals Observe the reversal

of hn shown as h k and the shifting of the reversed signal shown as

h k h k etc

pixel of the input image Figure illustrates linear D convolution per

formed as described above Note that in the case of a PSF having symmetry

about both of its axes the reversal step has no eect and is not required

Matrix representation of D convolution is described in Section

Diagonalization of a circulant matrix

An important property of a circulant matrix is that it is diagonalized by the

DFT Consider the general circulant matrix

C C C CN CN

CN C C CN CN

C C C C C

C C C CN C

Let W exp

which leads to W

for any integer k Then W

k N are the N distinct roots of unity Now consider

k C C W

CN W

It follows that

CN CW

C W

CN W

CN CN W

CN W

and so on to

C CW

This series of relationships may be expressed in compact form as

kWk CWk

where

Therefore k is an eigenvalue and Wk is an eigenvector of the circulant

matrix C Because there are N values W

k N that are

distinct there are N distinct eigenvectorsWk which may be written as the

N N matrix

W W W WN

which is related to the DFT The n k

element of W is exp

Due

to the orthogonality of the complex exponential functions the n k

element

FIGURE

Illustration of the linear convolution of two D functions Observe the reversal

of the PSF in parts a " c and the shifting of the reversed PSF as a mask

placed on the image to be ltered in part d The shifted mask is shown for

two pixel locations Observe that the result needs to be written in a dierent

array The result of size and shown in part e has two rows and two

columns more than the input image of size

of W

exp

We then have WW

W I where I

is the N N identity matrix The columns of W are linearly independent

The eigenvalue relationship may be written as

W CW

where all the terms are N N matrices and is a diagonal matrix whose

elements are equal to k k N The expression above may be

modied to

C WW

Thus we see that a circulant matrix is diagonalized by the DFT operatorW

Returning to the relationships of periodic convolution because h is circu

lant we have

h WD

where D

is a diagonal matrix corresponding to in the preceding discus

sion The elements of D

are given by multiplying the rst row of the matrix

h in Equation with W

n N as in Equation

Hk h hN W

hN W

h W

which is a DFT relationship that is Hk is the DFT of hn Note The

series of h above represents hN n n N which is equal

to hn due to periodicity The series of W values represents expj

n N The expression may be converted to the usual forward

DFT form by substituting n m

It follows that the result of the convolution operation is given by

g WD

The interpretation of the matrix relationships above is as follows W

f is

the forward DFT of f with a scale factor of

The multiplication of this

expression by D

corresponds to pointbypoint transformdomain ltering

with the DFT of h The multiplication by W corresponds to the inverse

DFT except for the scale factor

We now have the following equivalent

relationships that represent convolution

gn hn fn

Gk Hk F k

g h f

g WD

Note The representation of the Fourier transform operator above is dier

ent from that in Equation

Blockcirculant matrix representation of a D lter

In the preceding sections we saw how a D LSI lter may be represented by

the matrix relationship g h f with the special case of periodic convolution

being represented as above with h being a circulant matrix Now let us

consider the matrix representation of D lters Let f represent an original

image or the input image to a D lter and let g represent the corresponding

ltered image with the D arrays having been converted to vectors or column

matrices by row ordering see Figure Let us assume that all of the

images have been padded with zeros and extended to M N arrays with M

and N being large such that circular convolution yields a result equivalent to

that of linear convolution The images may be considered to be periodic with

the period M N The matrices f and g are of size MN

The D periodic convolution expression in array form is given by

gmn

f h m modM n mod N

form M and n N The result is also periodic

with the period M N

In order for the expression g h f to represent D convolution we need

to construct the matrix h as follows

where the submatrices are given by

hm hmN hmN hm

hm hm hmN hm

hm hm hm hm

hmN hmN hmN hm

The matrix h is of size MN MN Each N N submatrix h

is a circulant

matrix The submatrices of h are subscripted in a circular manner h is

known as a blockcirculant matrix

Example Let us consider ltering the image fmn given by

fmn

Although the image has nonzero pixels over only a region it has been

padded with zeros to the extent of a array to allow for the result of con

volution to be larger without wraparound errors due to periodic convolution

The subtracting Laplacian operator also extended to a array

is given by

hmn

However this form of the operator has its origin at the center of the array

whereas the origin of the image fmn as above would be at the topleft

corner in matrixindexing order Therefore we need to rewrite hmn as

follows

hmn

Observe that due to the assumption of periodicity the values of hmn cor

responding to negative indices now appear on the opposite ends of the matrix

The matrices corresponding to the relationship g h f are given in Fig

ure The resulting image gmn in array format is

gmn

Diagonalization of a blockcirculant matrix Let us dene the follow

ing functions that are related to the D DFT w

km exp

and w

l n exp

Let us dene a matrix W of size MN MN

containing M

partitions each of size N N The km

partition of W is

Wkm w

kmW

for km M whereW

is an N N matrix with its elements

given by w

l n for l n N

R e m o v a l o f A r t i f a c t s

j j j j

FIGURE

Matrices and vectors related to the application of the Laplacian operator to an image

Now W

is also a matrix of size MN MN with M

partitions of size

N N The km

partition of W

kmW

where w

km exp

for km M The matrix

has its elements given by

l n where w

l n exp

for l n N The denitions above lead toWW

I where I is the MN MN identity matrix If h is a blockcirculant matrix

it can be shown that h WD

or D

hW where D

a diagonal matrix whose elements are related to the DFT of hmn that is

to Hk l

Similar to the D case expressed by the relationships in Equation we

have the following equivalent relationships that represent D convolution

gmn hmn fmn

Gk l Hk l F k l

g h f

g WD

Note Considering an N N image fmn the N

DFT matrix W

above is dierent from the NN DFT matrixW in Equation The image

matrix f in Equation is of size N N whereas the image is represented

as an N

vector in Equation

Dierentiation of functions of matrices The major advantage of ex

pressing images and image processing operations in matrix form as above is

that mathematical procedures for optimization and estimation may be applied

with ease For example we have the following derivatives given the vectors

f and g and a symmetric matrix W

f g

and

Wf Wf

Several derivations in Sections and as well as in Chapters and

demonstrate how optimization of lters may be performed using matrix

representation of images and image processing operations as above

Optimal Filtering

The eld of image processing includes several procedures that may be char

acterized as ad hoc methods procedures that have been designed to address

a particular problem and observed to yield good results in certain specic

applications or scenarios Often the conditions under which such methods

perform well are not known or understood and the application of the meth

ods to other images or situations may not lead to useful results Certain

mathematical models and procedures permit the design of optimal lters

lters that are derived through an optimization procedure that minimizes a

cost function under specic conditions Such procedures state explicitly the

necessary conditions and the behavior of the lter is predictable However

as we shall see in the following paragraphs the application of optimization

methods and the resultant optimal lters require specic knowledge of the

image and noise processes

The Wiener lter

The Wiener lter is a linear lter designed to minimize the MSE between

the output of the lter and the undegraded unknown original image The

lter output is an optimal estimate of the original undegraded image in the

MSE sense and hence is known as the linear minimum mean squarederror

LMMSE or the leastmeansquare LMS estimate

Considering the degradation of an image f by additive noise that is inde

pendent of the image process we have the degraded image given by

g f

Degradation including the PSF matrix h is considered in Chapter

The Wiener estimation problem may be stated as follows

determine a linear estimate

f Lg of f from the given image g where L is

the linear lter or transform operator to be designed Recall that f and g

are N

matrices formed by row or column ordering of the corresponding

N N images and that L is an N

matrix

The optimization criterion used to design the Wiener lter is to minimize

the MSE given by

Let us express the MSE as the trace of the outer product matrix of the error

vector

We have the following expressions that result from or are related to the

above

f f

L f f

L f

f f

Because the trace of a sum of matrices is equal to the sum of their traces the

E and Tr operators may be interchanged in order Applying the E operator

to the expressions above we get the following expressions

E f f

which is the autocorrelation matrix of the image

E f

because f and are assumed to be statistically indepen

dent

f f

which is the noise autocorrelation matrix

Now the MSE may be written as

Note Tr

Tr L

because

is symmetric At this point the

MSE is no longer a function of the images f g or but depends only on the

statistical characteristics of f and and on L

To obtain the optimal lter operator L we may now dierentiate the ex

pression above with respect to L equate it to zero and solve the resulting

expression as follows

The optimal lter is given by

Wiener

The ltered image is given by

Implementation of the Wiener lter Consider the matrix

that needs to be inverted in Equation The matrix would be of size

for N N images hence inversion of the matrix as such would be

impractical when N is large Inversion becomes easier if the matrix can be

written as the product of a diagonal matrix and a unitary matrix

Now

is a diagonal matrix if is an uncorrelated random white noise

process In most real images correlation between pixels reduces as the spatial

shift or distance between the pixel positions considered increases

is then

banded with several zeros and may be approximated by a blockcirculant

matrix Then we can write

and

where represents the diagonal matrix corresponding to resulting from

the Fourier transform operation by W see Section This step leads to

the Wiener lter output being expressed as

f W

The following interpretation of the entities involved in the Wiener lter

reduces the expression above to a more familiar form

g is equivalent to Gk l the Fourier transform of gmn

W is equivalent to S

k l the PSD of fmn

W is equivalent to S

k l the noise PSD

Then we get the Wiener estimate in the Fourier domain as

F k l

k l

k l S

k l

Gk l

For other derivations of the Wiener lter see Wiener Lim and

Rangayyan

Observe that the Wiener lter transfer function depends upon the PSD of

the original signal and noise processes the dependence is upon the second

order statistics of the processes rather than upon single realizations or obser

vations of the processes The design of the Wiener lter as above requires the

estimation of the PSDs or the development of models thereof Although the

original signal itself is unknown it is often possible in practice to estimate its

PSD from the average spectra of several artifactfree observations of images

of the same class or type Noise statistics such as variance may be estimated

from signalfree parts of noisy observations of the image

The gain of the Wiener lter varies from one frequency sample to another

in accordance with the SNR expressed as a function of frequency in the D

u v or k l space the gain is high wherever the signal component S

k l

is strong as compared to the noise component S

k l that is wherever the

SNR is high the gain is low wherever the SNR is low The gain is equal to

unity if the noise PSD is zero However it should be noted that the Wiener

lter as above is not spatially adaptive its characteristics remain the same

for the entire image For this reason while suppressing noise the Wiener

lter is likely to blur sharp features and edges that may exist in the image

and share the highfrequency spectral regions with noise

Frequencydomain implementation of the Wiener lter as in Equation

obviates the need for the inversion of large correlation matrices as in Equa

tion The use of the FFT algorithm facilitates fast computation of the

Fourier transforms of the image data

Example Figure a shows the original Shapes test image part b

shows the test image with Gaussiandistributed noise added normal

ized

The logmagnitude spectrum of the noisy image is shown in

part c of the gure In order to implement the Wiener lter as in Equa

tion the true image PSD was modeled using a Laplacian function with

pixels in the Fourier domain represented using a array The

noise PSD was modeled by a uniform function having its total energy area

under the function equal to times that of the Laplacian PSD model The

Wiener lter transfer function is illustrated in Figure d The output

of the Wiener lter is shown in part e of the gure while the noise in the

uniform areas of the image has been suppressed the edges of the objects in

the image have been severely blurred The blurring of edges has resulted in

an increase of the RMS error from for the noisy image to for the

Wiener lter output It is clear that the assumption of stationarity is not

appropriate for the test image the use of a xed lter albeit optimal in the

MSE sense has led to a result that is not desirable Furthermore the design

of appropriate signal and noise PSD models is dicult in practice inappro

priate models could lead to poor performance as illustrated by the present

example

The implicit assumption made in deriving the Wiener lter is that the

noise and image processes are secondorder stationary processes that is their

mean and variance do not vary from one image region to another This also

leads to the assumption that the entire image may be characterized by a

single frequency spectrum or PSD Most reallife images do not satisfy these

assumptions to the fullest extent which calls for the design of spatially or

locally adaptive lters See Section for details on locally adaptive optimal

lters

Adaptive Filters

The local LMMSE lter

Lee developed a class of adaptive localstatisticsbased lters to obtain

the LMMSE estimate of the original image from a degraded version The

degradation model represents an image fmn corrupted by additive noise

mn as

gmn fmn mn mn

a b

c d

e f

FIGURE

a Shapes test image b Image in a with Gaussiandistributed noise added

with normalized

RMS error c Logmagnitude

spectrum of the image in b d Gain of the Wiener lter in the frequency

domain that is the magnitude transfer function Result of ltering the noisy

image in b using e the Wiener lter as in d RMS error f the

local LMMSE lter with a window RMS error

This model is equivalent to that in Equation but has been shown in

the D array form with the indices mn to indicate the pixel location in

order to demonstrate the locally adaptive nature of the lter to be derived

The original image f is considered to be a realization of a nonstationary

random eld characterized by spatially varying moments mean standard

deviation etc The noise process may be either signalindependent or

signaldependent and could be nonstationary as well

The LMMSE approach computes at every spatial location mn an estimate

fmn of the original image value fmn by applying a linear operator to

the available corrupted image value gmn Scalars amn and bmn are

sought such that the value

fmn computed as

fmn amn gmn bmn

minimizes the local MSE

fmn fmn

where the bar above the expression indicates some form of averaging statisti

cal expectation ensemble averaging or spatial averaging We have the local

MSE in expanded form as

mn amn gmn bmn fmn

The values amn and bmn that minimize

mn are computed by

taking the partial derivatives of

mn with respect to amn and bmn

setting them to zero and solving the resulting equation as follows

bmn

famn gmn bmn fmng

which leads to

bmn fmn amn gmn

Now replacing bmn in Equation with its value given by Equa

tion we get the local MSE as

amnfgmn gmng ffmn fmng

Dierentiating this expression with respect to amn and setting the result

to zero we get

amnfgmn gmng ffmn fmng

fgmn gmng

Now we may allow gmn gmn

mn to represent the local

variance of g and gmn gmn fmn fmn

mn the

local covariance between f and g This leads to

amn

With amn given by Equation and bmn given by Equation

the LMMSE estimate formula in Equation becomes

fmn fmn

gmn gmn

Because the true statistics of both the original and the corrupted image

as well as their joint statistics are usually unknown in a practical situation

Lee proposed to estimate them locally in a spatial neighborhood of the pixel

mn being processed leading to the local LMMSE that is the LLMMSE

estimate Using a rectangular window of size P Q centered

at the pixel mn being processed we get local estimates of the mean and

variance of the noisy image g as

P Q

gm p n q

and

P Q

gm p n q

It should be noted that the parameters are expressed as functions of space

and are spacevariant entities The LLMMSE estimate is then approximated

by the following pixelbypixel operation

fmn

gmn

For other derivations of the LLMMSE lter also known as the Wiener lter

see Lim

Comparing Equation with observe that fmn is approxi

mated by

mn and that

mn is estimated by the dierence between

the local variance of the degraded image and that of the noise process The

LLMMSE lter is rendered spatially adaptive and nonlinear by the

spacevariant estimation of the statistical parameters used

In deriving Equation from Equation the assumption that the

noise is uncorrelated with the image is taken into account The variance of

the noise

mn is constant over the whole image if the noise is assumed to

be signalindependent but varies if the noise is signaldependent in the latter

case

mn should be estimated locally with a knowledge of the type of the

noise that corrupts the image Lees lter was originally derived to deal with

signalindependent additive noise and signaldependent multiplicative noise

but may be adapted to other types of signaldependent noise such as Poisson

or lmgrain noise

The interpretation of Equation is as follows if the processing window

overlaps a uniform region in which any variation is due mostly to the noise the

second term will be small Thus the LLMMSE estimate is equal to the local

mean of the noisy image the noise is thereby reduced If on the contrary

there is an edge in the processing window the variance of the noisy image is

larger than the variance of the noise The LLMMSE estimate in this case is

closer to the actual noisy value gmn and the edge does not get blurred

The lter provides good noise attenuation over uniform areas but poor noise

ltering near edges

Aghdasi et al proposed detailed models of degradation of mammo

grams including Poisson noise lmgrain noise and blurring by several com

ponents along the chain of image acquisition systems They applied the local

LMMSE lter as above to remove noise and compared its performance with

a Bayesian lter The parametric Wiener lter see Section was then

applied to deblur the noisefree mammograms

Matrix representation A matrix or vectorial version of Equation

may also be derived as follows The LMMSE estimate is expressed as

f Ag b

The MSE between the estimate and the unknown original image may be

expressed as

Substituting the expression in Equation we get

f Ag bf Ag b

f f

f g

f b

Agf

Agg

Agb

Dierentiating the expression above with respect to b and setting it to zero

we get

E Tr ff Ag f Ag bg

solving which we get

f A

where indicates some form of averaging

Using the expression derived for b above we get the following

f f Ag b

f Ag

f A

fAg

where f

f and g

g for the sake of compactness in further

derivation Now the MSE becomes

Dierentiating the expression above with respect to A and setting it to zero

we get

Now E

the covariance matrix of g

Similarly E

the crosscovariance matrix of f and g Thus we

get A

Finally we obtain the LMMSE estimate as

This expression reduces to Equation when local statistics are substituted

for the expectationbased statistical parameters

Due to the similarity of the optimization criterion used the LLMMSE lter

as above is also referred to as the Wiener lter in some publications

The use of local statistics overcomes the limitations of the Wiener lter due to

the assumption of stationarity the procedure also removes the need to invert

large matrices Nonlinearity if it is of concern is the price paid in order to

gain these advantages

Example Figure f shows the result of application of the LLMMSE

lter as in Equation using a window to the noisy test image in

part b of the same gure The noise in the uniform background regions in

the image as well as within the geometric objects with uniform gray levels

has been suppressed well by the lter However the noise on and around the

edges of the objects has not been removed Although the lter has led to a

reduction in the RMS error the leftover noise around edges has led to a poor

appearance of the result

Re ned LLMMSE lter In a rened version of the LLMMSE lter

if the local signal variance

mn is high it is assumed that the processing

window is overlapping an edge With the further assumption that the edge is

straight which is reasonable for small windows the direction of the edge is

computed using a gradient operator with eight possible directions for the edge

According to the direction of the edge detected the processing window is split

into two subareas each of which is assumed to be uniform see Figure

Then the statistics computed within the subarea that holds the pixel being

processed are used in Equation to estimate the output for the current

pixel This step reduces the noise present in the neighborhood of edges without

blurring the edges Over uniform areas where

mn has reasonably small

values the statistics are computed over the whole area overlapped by the

processing window

Results of application of the rened LLMMSE lter are presented in Sec

tion

FIGURE

Splitting of a neighborhood for adaptive ltering based upon the direction

of a local edge detected within the neighborhood in a rened version of the

local LMMSE lter One of the eight cases shown is selected according

to the direction of the gradient within the neighborhood Pixels in the

partition containing the pixel being processed are used to compute the local

statistics and the output of the lter Based upon a similar gure in JS Lee

Rened ltering of image noise using local statistics Computer Graphics

and Image Processing "

The noiseupdating repeated Wiener lter

The noiseupdating repeated Wiener NURW lter was introduced by Jiang

and Sawchuk to deal with signalindependent additive signaldependent

Poisson and multiplicative noise The image is treated as a random eld

that is nonstationary in mean and variance The NURW lter consists

of an iterative application of the LLMMSE lter After each iteration the

variance of the noise is updated for use in the LLMMSE estimate formula of

Equation in the next iteration as

new

P Q

m p n q

By iterating the lter noise is substantially reduced even in areas near edges

In order to avoid the blurring of edges a dierent smaller processing window

size may be chosen for each iteration Jiang and Sawchuk demonstrated

the use of the NURW lter to improve the quality of images degraded with

additive multiplicative and Poisson noise

Results of application of the NURW lter are presented in Section

The adaptive D LMS lter

Noise reduction is usually accomplished by a ltering procedure optimized

with respect to an error measure the most widely used error measure is the

MSE The Wiener lter is a classical solution to this problem However the

Wiener lter is designed under the assumption of stationary as well as statis

tically independent signal and noise PDF models This premise is unlikely to

hold true for images that contain large graylevel uctuations such as edges

furthermore it does not hold true for signaldependent noise Recent methods

of circumventing this problem have taken into account the nonstationarity of

the given image One example is the method developed by Chan and Lim

which takes into account the image nonstationarity by varying the lter pa

rameters according to the changes in the characteristics or statistics of the

image Another example is the adaptive D LMS algorithm developed by

Hadhoud and Thomas which is described next

The D LMS method is an example of a xedwindow Wiener lter in which

the lter coecients vary depending upon the image characteristics The

algorithm is based on the method of steepest descent and tracks the variations

in the local statistics of the given image thereby adapting to dierent image

features The advantage of this algorithm is that it does not require any a

priori information about the image the noise statistics or their correlation

properties Also it does not require any averaging dierentiation or matrix

operations

The D LMS algorithm is derived by dening a causal FIR lter w

p q

whose region of support ROS is P P P typically being such that

fmn

p q gm p n q

where

fmn is the estimate of the original pixel value fmn gmn is

the noisecorrupted input image and l marks the current position of the lter

in the image which is given by l mM n for the pixel position mn in

an M N image and will take values from to MN

The lter coecients w

p q for the pixel position l are determined

by minimizing the MSE between the desired pixel value fmn and the es

timated pixel value

fmn at the present pixel location l using the method

of steepest descent The lter coecients w

mn are estimated as the

present coecients w

p q plus a change proportional to the negative gradi

ent of the error power MSE expressed as

p q w

p q r e

where is a scalar multiplier controlling the rate of convergence and lter

stability e

is the error signal dened as the dierence between the desired

signal fmn and the estimate

fmn and r is a gradient operator with

respect to w

p q applied to the error power e

at l Because the original

image fmn is unknown and the only image on hand is the noisecorrupted

image gmn an approximation to the original image dmn is used and

the error is estimated as

dmn

fmn

The technique used by Hadhoud and Thomas to obtain dmn was to

estimate it from the input image gmn by decorrelation the decorrelation

operator being the D delay operator of samples This allows the cor

relation between dmn and gmn to be similar to the correlation between

fmn and gmn and in turn makes dmn correlated to fmn to

some extent Evaluation of Equation using Equation and Equa

tion gives

p q w

p q e

gm p n q

which is a recursive equation dening the lter coecients at the pixel position

l in terms of those at the position l

Implementation of the D LMS lter Equation and Equa

tion give the D LMS lter and the lter weight updating algorithms

respectively Convergence of the algorithm does not depend upon the initial

conditions it converges for any arbitrary initial value and hence provides

good nonstationary performance In comparing the D LMS lter with the

nonadaptive LMS algorithm the second term of Equation would not

be included in the latter and the lter coecients would not change from

pixel to pixel under the assumption that the image is stationary This would

put a constraint on the initial coecient values because they would be the

values used for the whole image and thus dierent initial values would result

in dierent ltered outputs Although the initial conditions do not aect the

convergence of the D LMS lter the choice of the convergence factor de

pends on the particular application and involves a tradeo between the rate

of convergence the ability to track nonstationarity and steadystate MSE

Example Figures a and b show a test image and its noisy version

the latter obtained by adding zeromean Gaussian noise with

to the

former Part c of the gure shows the result of application of the D LMS

lter to the noisy image In implementing Equation and Equation

the initial weights of the lter w

p q were estimated by processing lines

of the given image starting with zero weights The weights obtained after

processing the lines were then used as the initial conditions and processing

was restarted at mn in the given image The convergence factor

was determined by trial and error for dierent images as suggested by

Hadhoud and Thomas and set to

the ROS used was

The D LMS algorithm applies a gradually changing lter that tends to

suppress noise in a relatively uniform manner over the image Although this

typically results in lower values of the MSE it also tends to smooth or blur

edges and other structured features in the image and to leave excessive noise

in uniform regions of the image One explanation to this eect could be the

fact that the adaptive weights of the lter w

p q depend on the model of the

original image which is approximated by the decorrelated image dmn Be

cause dmn is not an accurate approximation of the original image fmn

the updated lter weights are not optimal and thus the algorithm does not

give the optimal MSE value The D LMS algorithm did not perform par

ticularly well on the test image in Figure b The resultant image in

Figure c appears signicantly blurred with remnant noise

The adaptive rectangular window LMS lter

In order to overcome the limitations due to the assumption of stationarity a

Wiener lter approach using an adaptivesized rectangular window ARW to

estimate the lter coecients was proposed by Song and Pearlman

and rened by Mahesh et al Using the same image degradation

model as that in Equation but with the additional assumption that the

image processes also have zero mean the estimate used by Mahesh et al is

of the form

fmn mn gmn

The problem reduces to that of nding the factor mn at each pixel location

using the same minimum MSE criterion as that of the standard Wiener lter

The error is given by

emn fmn

fmn fmn mn gmn

Minimization of the MSE requires that the error signal e be orthogonal to the

image g that is

Ef fmn mn gmn gmng

Solving for we obtain

If the original image is not a zeromean process Equation can still be

used by rst subtracting the mean from the images f and g Because the noise

is of zero mean for all pixels the a posteriori mean

mn of the image g

at pixel position mn is equal to the a priori mean

mn of the original

image fmn and the estimate of Equation thus becomes

fmn

gmn

a b

c d

e f

FIGURE

a Shapes a test image with various geometrical objects placed at

random b Image in a with Gaussian noise added RMS error Result of

ltering the image in b with c the D LMS lter RMS error d two

passes of the ARWLMS lter RMS error e the ANNS lter RMS error

f two passes of the ANNS lter RMS error Reproduced with permis

sion from RB Paranjape TF Rabie and RM Rangayyan Image restoration by

adaptiveneighborhood noise subtraction Applied Optics

Optical Society of America

Implementation of the ARWLMS lter In implementing Equa

tion it is necessary to make the assumption that the image pixel values

in the immediate neighborhood of a pixel mn are samples from the same

ensemble as that of fmn that is a globally nonstationary process can be

considered to be locally stationary and ergodic over a small region Thus if we

can accurately determine the size of a neighborhood in which the image values

have the same statistical parameters the sample statistics can approximate

the a posteriori parameters needed for the estimate in Equation

The view taken by Song and Pearlman was to identify the size

of a stationary square region for each pixel in the image and to calculate the

local statistics of the image within that region The size of the window changes

according to a measure of signal activity an eective algorithm was proposed

to determine the window size that improved the performance of various point

estimators The eect of the improved performance was greater smoothing

in relatively at signalfree regions in the image and less smoothing across

edges

In deriving the local sample statistics denoted as #

mn and #

Mahesh et al made use of ARWs of length L

in the row direction and

in the column direction Because the window is required to be centered

about the pixel being processed the ARW lengths need to be odd Except

near the borders of the image the ARW dimensions can be expressed as

and L

where N

and N

are the dimensions of the

onesided neighborhood Within this window the local mean and variance

are calculated as

gm p n q

and

gm p n q #

The local variance of the original image #

is estimated as

if #

otherwise

Using the local sample statistics as above Equation becomes

fmn #

gmn #

It should be noted that the parameters L

and #

well as the other parameters that follow are computed for each pixel mn

and should be denoted as L

mn etc this detail has been suppressed for

convenience of notation It should also be noted that although the noise

variance

is usually not known a priori it can be easily estimated from a

window in a at signalfree area of the degraded image

For accurate estimation of

and

the pixels in the ARW should belong

to the same ensemble as that of the central pixel being ltered If relatively

large windows are used the windows may cross over the boundaries of dierent

regions and include pixels from other ensembles In such a case blurring could

result across the edges present within the windows On the other hand if the

windows are too small the lack of samples would result in poor estimates

of the mean and variance and consequently insucient noise suppression

would occur over uniform regions Thus it is desirable to use small windows

where the image intensity changes rapidly and large windows where the image

contrast is relatively low

In the method of Mahesh et al the ARW lengths L

and L

are varied

depending upon a signal activity parameter S

dened as

gm p n q #

where #

is the local mean evaluated in the row direction as

gm p n

is a measure of local roughness of the image in the row direction being

equal to the variance of the original image in the same direction A similar

signal activity parameter is dened in the column direction If the signal

activity parameter S

in the row direction is large indicating the presence

of an edge or other information the window size in the row direction N

is decremented so that points from other ensembles are not included If S

is small indicating that the current pixel lies in a lowcontrast region N

is incremented so that a better estimate of the mean and variance may be

obtained In order to make this decision the signal activity parameter in the

row direction is compared to a threshold T

and N

is updated as follows

if S

or N

if S

A similar procedure is applied in the column direction to update N

Prespecied minimum and maximum values for N

and N

are used to limit

the size of the ARW to reasonable dimensions which could be related to the

size of the details present in the image The threshold T

is dened as

where is a weighting factor that controls the rate at which the window size

changes The threshold varies in direct proportion to the noise variance and

in inverse proportion to the ARW dimension Thus if the noise variance is

high the threshold will be high and the window length is more likely to be

incremented than decremented This will lead to a large window and eective

smoothing of the noise If the window size is large the threshold will be small

Thus the window size is likely to be decremented for the next pixel A similar

threshold is dened in the column direction This helps the ARW length to

converge to a certain range

Example The result of application of the ARWLMS lter to the noisy

test image in Figure b is shown in part d of the same gure The

ARW size was restricted to be a minimum of and a maximum of

the value of the weighting factor in Equation was xed at The

ARWLMS algorithm has resulted in much less smoothing at the edges of

the objects in the image than in the uniform regions As a result a layer of

noise remains in the ltered image surrounding each of the objects Although

this is clearly objectionable in the synthesized image the eect was not as

pronounced in the case of natural scenes because ideal edges are not common

in natural scenes

The ARWLMS output image appears to be clearer and sharper than the D

LMS restored image see Figure c because the ARWLMS algorithm

tends to concentrate the error around the edges of objects where the human

visual system tends to ignore the artifact On the other hand the D LMS

algorithm tends to reduce the error uniformly which also leads to smoothing

across the edges the decreased sharpness of the edges makes the result less

pleasing to the viewer

The adaptiveneighborhood lter

An adaptiveneighborhood paradigm was proposed by Paranjape et al

to lter additive signalindependent noise and extended to multiplicative

noise by Das and Rangayyan to signaldependent noise by Rangayyan

et al and to color images by Ciuc et al Unlike the other methods

where statistics of the noise and signal are estimated locally within a xedsize

xedshape usually rectangular neighborhood the adaptiveneighborhood

ltering approach consists of computing statistics within a variablesize vari

ableshape neighborhood that is determined individually for every pixel in the

image It is desired that the adaptive neighborhood grown for the pixel being

processed which is called the seed contains only those spatially connected

pixels that are similar to the seed that is the neighborhood does not grow

over edges but overlaps a stationary area If this condition is fullled the

statistics computed using the pixels inside the region are likely to be closer to

the true statistics of the local signal and noise components than the statistics

computed within xed neighborhoods Hence adaptiveneighborhood lter

ing should yield more accurate results than xedneighborhood methods The

approach should also prevent the blurring or distortion of edges because adap

tive neighborhoods if grown with an appropriate threshold should not mix

the pixels of an object with those belonging to its background

There are two major steps in adaptiveneighborhood ltering adaptive

region growing and estimation of the noisefree value for the seed pixel using

statistics computed within the region

Region growing for adaptiveneighborhood ltering In adaptive

neighborhood ltering a region needs to be grown for the pixel being pro

cessed the seed such that it contains only pixels belonging to the same object

or image feature as the seed Paranjape et al used the following

regiongrowing procedure for images corrupted by signalindependent Gaus

sian additive noise the absolute dierence between each of the connected

neighbors gp q and the seed gmn is computed as

jgp q gmnj

Pixels gp q having d

T where T is a xed predened threshold are

included in the region The procedure continues by checking the neighbors of

the newly included pixels in the same manner and stops when the inclusion

criterion is not fullled for any neighboring pixel An adaptive neighborhood

is grown for each pixel in the image

In addition to the foreground region an adaptive background region is also

grown for each pixel The background is obtained by expanding dilating

the outermost boundary of the foreground region by a prespecied number

of pixels This provides a ribbon of a certain thickness that surrounds the

foreground region The foreground region may be further modied by com

bining isolated pixels into local regions Observe that a foreground region

could contain several disjoint regions that are not part of the foreground due

to dierences in gray level that are larger than that permitted by the thresh

old applied such regions could be considered to be part of the background

although they are enclosed by the foreground region

Example Figure illustrates the growth of an adaptive neighborhood

The foreground part of the adaptive neighborhood is shown in a light shade

of gray The black pixels within the foreground have the same gray level as

that of the seed pixel from where the process was commenced the use of a

simple threshold for region growing as in Equation will result in the

same region being grown for all such pixels for this reason they could be

called redundant seed pixels The regiongrowing procedure need not be ap

plied to such pixels which results in computational savings Furthermore all

redundant seed pixels within a region get the same output value as computed

for the original seed pixel from where the process was commenced

Adaptiveneighborhood mean and median lters Paranjape et al

proposed the use of mean and median values computed using adaptive

neighborhoods to lter noise The basic premise was that contextdependent

adaptive neighborhoods provide a larger population of pixels to compute lo

cal statistics than or neighborhoods and that edge distortion

a b

c d

e f

FIGURE

Illustration of the growth of an adaptive neighborhood left to right and top

to bottom The neighborhood being grown is shown in a light shade of gray

The black pixels within the neighborhood have the same gray level as that of

the seed from where the process was commenced they are called redundant

seed pixels The last gure shows a region in a darker shade of gray that

surrounds the foreground region this is known as the adaptive background

region Figure courtesy of WM Morrow

is prevented because such neighborhoods are not expected to transgress the

boundaries of the objects or features in the image They also showed that the

method could be iterated to improve the results

Example Figures a and b show a test image and its noisy version

with additive Gaussiandistributed noise Parts c and d of the gure show

the results of ltering the noisy image using the mean and median

respectively The blurring of edges caused by the mean and the distortion

of shape caused by the median are clearly seen in the results Parts e

and f of the gure illustrate the results of the adaptiveneighborhood mean

and median lters respectively Both methods have been equally eective

in removing the noise without causing any blurring or distortion of edges

However in other experiments with high levels of noise and when the process

was iterated it was observed that the adaptiveneighborhood lters could

lead to the loss of objects that have small graylevel dierences with respect

to their surroundings The adaptive neighborhood in such a case could mix

an object and its surroundings if the threshold is low compared to the noise

level and the contrast of the object

Adaptiveneighborhood noise subtraction ANNS Paranjape et

al proposed the ANNS method to remove additive signalindependent

noise The algorithm estimates the noise value at the seed pixel gmn by

using an adaptive neighborhood and then subtracts the noise value from the

seed pixel to obtain an estimate of the original undegraded value

fmn

The strategy used in deriving the ANNS lter is based upon the same prin

ciples as those of the ARWLMS algorithm the image process f is assumed

to be a zeromean process of variance

that is observed in the presence of

additive white Gaussian noise resulting in the image g The noise process

is assumed to have zero mean and variance of

and assumed to be uncor

related to f An estimate of the additive noise at the pixel mn is obtained

from the corresponding adaptive neighborhood grown in the corrupted image

g as

#mn gmn

where is a scale factor which depends on the characteristics of the adaptive

neighborhood grown Then the estimate of fmn is

fmn gmn #mn

which reduces to

fmn gmn

where

As described in Section on the ARWLMS algorithm if the images

used are of nonzero mean the estimate of Equation can be used by rst

subtracting the mean of each image from both sides of the equation Then

the estimate may be expressed as

fmn

mn gmn

a b

c d

e f

FIGURE

a Shapes a test image with various geometrical objects placed

at random b Image in a with Gaussian noise added RMS error

Result of ltering the image in b with c the mean RMS error

d the median RMS error e the adaptiveneighborhood

mean RMS error f the adaptiveneighborhood median RMS error

Images courtesy of RB Paranjape

where

is the a posteriori mean of the degraded image gmn which is

also equal to the a priori mean

of the original image fmn for zeromean

noise

The problem now is to nd the factor which is based upon the criterion

that the estimated noise variance

be equal to the original noise variance

The solution is obtained as follows

E #

f gmn

The noise estimation factor is then given by

Thus the estimate of Equation becomes

fmn