Image Coding and Data Compression | 18 | Biomedical Image Analysis

ABSTRACT

Image Coding and Data Compression

High spatial resolution and ne grayscale quantization are often required

in biomedical imaging Digital mammograms are typically represented in

arrays of pixels with bpixel leading to rawdata les of

the order of MB per image Volumetric data obtained by CT and MRI

could be of size voxels with bvoxel occupying MB

per examination Patients with undetermined or multiple complications may

undergo several examinations via dierent modalities such as Xray imaging

ultrasound scanning CT scanning and nuclear medicine imaging resulting

in large collections of image les

Most healthcare jurisdictions require medical records including images of

adults to be stored for durations of the order of seven years from the date

of acquisition Children s records and images are required to be maintained

until at least the time they reach adulthood

With the view to improve the eciency of storage and access several imag

ing centers and hospitals have moved away from lmbased storage toward

electronic storage Furthermore most medical imaging systems have moved

to direct digital image acquisition with adequate resolution putting aside the

debate on the quality of an original lmbased image versus that of its scanned

digitized representation Since an entire series of conferences has been

dedicated to PACS see the PACS volumes of the SPIE Medical Imaging con

ference series Networks and systems for PACS are integrated into the

infrastructure of most modern hospitals The major advantages and disad

vantages of digital and lmbased archival systems are listed below

Films deteriorate with age and handling Digital images are unaected

by these factors

Despite elaborate indexing schemes lms tend to get lost or misplaced

Digital image les are less likely to face these problems

Digital image les may be accessed simultaneously by several users

Although multiple copies of lmbased images may be made it would

be an expensive option that adds storage and handling complexities

With the proliferation of computers digital images may be viewed and

manipulated at several convenient locations including a surgical suite

a patient s bedside and one s home or oce Viewing lmbased im

ages with detailed attention requires specialized viewing consoles under

controlled lighting conditions

Digital PACS require signicant initial capital outlay as well as routine

maintenance and upgrading of the computer storage and communi

cation systems However these costs may be oset by the savings in

the continuing costs of lm as well as the associated chemical process

ing systems and disposal The environmental concerns related to lm

processing are also removed by digital PACS

Digital images may be compressed via image coding and data compres

sion techniques so as to occupy less storage space

The nal point above forms the topic of the present chapter

Although the discussion above has been in the context of image storage or

archival similar concerns regarding the size of image les and the desirability

of compression arise in the communication of image data In this chapter

we shall study the basic concepts of information theory that apply to image

coding compression and communication We shall investigate several tech

niques for encoding image data including decorrelation procedures to modify

the statistical characteristics of the data so as to permit ecient representa

tion coding and compression

The representation of the signicant aspects of an image in terms of a small

number of numerical features for the purpose of pattern classication may

also be viewed as image coding or data compression however we shall treat

this topic separately see Chapter

Considerations Based on Information Theory

Image data compression is possible due to the following basic characteristics

Code redundancy all code words pixel values do not occur with

equal probability

Spatial redundancy the values of neighboring pixels tend to lie within

a small dynamic range and exhibit a high level of correlation

Psychovisual redundancy human analysts can recognize the essential

nature and components of an image from severely reduced versions such

as caricatures edges and regions and need not or do not pay attention

to precise numerical values

Informationtheoretic considerations are based upon the notion of informa

tion as related to the statistical uncertainty of the occurrence of an event

such as a signal an image or a pixel value rather than the structural sym

bolic pictorial semantic or diagnostic content of the entity The measure of

entropy is based upon the probabilities of occurrence of the various symbols

involved in the representation of a message or image see Section Despite

the mathematical and theoretical powers of measures such as entropy the

standpoint of viewing an image as being composed of discrete and indepen

dent symbols numerical values removes the analyst from the realworld and

physical properties of the image The use of the underlying assumptions also

lead to severe limitations in entropybased source coding with lossless com

pression factors often limited to the order of Additional techniques based

upon decorrelation of the image data via the identication and modeling of

the underlying imagegeneration phenomena or the use of pattern recognition

techniques could assist in improving the performance of image compression

procedures

Noiseless coding theorem for binary transmission

Given a code with an alphabet of two symbols and a source A with an alphabet

of two symbols the average length of the code words per source symbol may

be made arbitrarily close to the lower bound entropy HA by encoding

sequences of source symbols instead of encoding individual symbols

The average length Ln of encoded nsymbol sequences is bounded by

Diculties exist in estimating the true entropy of a source due to the fact

that pixels are statistically dependent that is correlated from pixel to pixel

row to row and frame to frame of reallife images The computation of the

true entropy requires that symbols be considered in blocks over which the

statistical dependence is negligible In practice this would translate to esti

mating joint PDFs of excessively long vectors Values of entropy estimated

with single pixels or small blocks of pixels would result in overestimates of

the source entropy If blocks of pixels are chosen such that the sequence

entropy estimates converge rapidly to the limit then blockcoding methods

may provide results close to the minimum length given by Equation

Runlength coding may be viewed as an adaptive blockcoding technique see

Section

Lossy versus lossless compression

A coding or compression method is considered to be lossless if the original

image data can be recovered with no error from the coded and compressed

data Such a technique may also be referred to as a reversible bitpreserving

or errorfree compression technique

A compression technique becomes lossy or irreversible if the original data

cannot be recovered with complete pixelbypixel numerical accuracy from

the compressed data In the case of images the human visual system can tol

erate signicant numerical dierences or error in the sense that the degraded

image recovered from the compressed data is perceived to be essentially the

same as the original image This arises from the fact that a human ob

server will typically not examine the numerical values of individual pixels

but instead assess the semantic or pictorial information conveyed by the data

Furthermore a human analyst may tolerate more error noise or distortion in

the uniform areas of an image than around its edges that attract visual atten

tion Data compression techniques may be designed to exploit these aspects

to gain signicant advantages in terms of highly compressed representation

with high levels of loss of numerical accuracy while remaining perceptually

lossless On the same token in medical imaging if the numerical errors in the

retrieved and reconstructed images do not cause any change in the diagnostic

results obtained by using the degraded images one could achieve high levels

of numerically lossy compression while remaining diagnostically lossless

In the quest to push the limits of numerically lossy compression techniques

while remaining practically lossless under some criterion the question arises as

to the worth of such practice Medical practice in the present highly litigious

society could face large nancial penalties and loss due to errors Radiologi

cal diagnosis is often based upon the detection of minor deviations from the

normal or average patterns expected in medical images If a lossy data com

pression technique were to cause such a faint deviation to be less perceptible

in the compressed and reconstructed image than in the original image and

the diagnosis based upon the reconstructed image were to be in error the

nancial compensation to be paid would cost several times the amount saved

in data storage the loss in professional standing and public condence could

be even more damaging In addition dening the delity of representation in

terms of the closeness to the original image or distortion measures is a di

cult and evasive activity Given the high levels of the professional care and

concern as well as the scal and emotional investment that are part of medi

cal image acquisition procedures it would be undesirable to use a subsequent

procedure that could cause any degradation of the image In this spirit only

lossless coding and compression techniques will be described in the present

chapter Regardless it should be noted that any lossy compression technique

may be made lossless by providing the numerical error between the original

image and the degraded image reconstructed from the compressed data Al

though this step will lead to additional storage or transmission requirements

the approach can facilitate the rapid retrieval or transmission of an initial

lowquality image followed by completely lossless recovery such a procedure

is known as progressive transmission especially when performed over multiple

stages of image quality or delity

Distortion measures and delity criteria

Although we have stated our interest in lossless coding of biomedical images

other processes such as the transmission of large quantities of data over noisy

channels may lead to some errors in the received images Hence it would be

relevant to consider the characterization of the distortion so introduced and

analyze the delity of the received image with respect to the original

The binary symmetric channel is characterized by a single parameter the

biterror probability p see Figure The channel capacity is given by

C p log p q log q

where q p

FIGURE

Transmission error probabilities in a binary symmetric channel

The leastsquares singleletter delity criterion is dened as

where x and y are the transmitted and received nbit vectors blocks or

words respectively

The Hamming distance between the vectors x and y is dened as

Measures of delity may also be dened based upon entire images by den

ing an error image as

emn gmn fmn

where gmn is the received degraded version of the original transmitted

image fmn and then dening the RMS value of the error as

RMS

gmn fmn

or SNR as

SNR

See Section for more details on measures of SNR

Fundamental Concepts of Coding

In general coding could be dened as the use of symbols to represent infor

mation The following list provides the denitions of a few basic terms and

concepts related to coding

An alphabet is a predened set of symbols

A word is a nite sequence of symbols from an alphabet

A code is a mapping of words from a source alphabet into the words of

a code alphabet

A code is said to be distinct if each code word is distinguishable from

the other code words

A distinct code is uniquely decodable if every code word is identiable

when immersed in a sequence of code words with no separators between

the words

A desirable property of a uniquely decodable code is that it be decodable

on a wordtoword basis This is ensured if no code word may be a prex

to another the code is then instantaneously decodable

A code is said to be optimal if it is instantaneously decodable and has

the minimum average length for a given source PDF

Examples of symbols are f g in the binary alphabet f g

in the octal system f g in the decimal system f

ABCDE Fg in the hexadecimal system fI V X LC D

Mg in the Roman system with the decimal equivalents of

and respectively and fA Z a zg in the English alphabet not

considering punctuation marks and special symbols An example of a word

in the context of image coding is in b binary coding standing

for the gray level in the decimal system Table lists the codes for

integers in the range in the Roman decimal binary Gray octal

and hexadecimal codes The Gray code has the advantageous feature

that only one digit is changed from one number to the next Observe that in

general all of the codes described here including the English language fail

the conditions dened above for an optimal code

Direct Source Coding

Pixels generated by reallife sources of images bear limitations in dynamic

range and variability within a small spatial neighborhood Therefore codes

used to represent pixel data at the source may be expected to demonstrate

certain patterns of limited variation and high correlation Furthermore real

life sources of images do not generate random uncorrelated values that are

equally likely instead it is common to encounter PDFs of gray levels that

are nonuniform Some of these characteristics may be exploited to achieve

ecient representation of images by designing coding systems tuned to specic

properties of the source Because the coding method is applied directly to pixel

values generated by the source without processing them by an algorithm to

generate a dierent series of values such techniques are categorized as direct

source coding procedures

Human coding

Human proposed a coding system to exploit the occurrence of some

pixel values with higher probabilities than other pixels The basic idea in

Human coding is to use short code words for values with high probabilities of

occurrence and longer code words to represent values with lower probabilities

of occurrence This implies that the code words used will be of variable length

the method also presumes prior knowledge of the PDF of the source symbols

gray levels It is required that the code words be uniquely decodable on

a wordbyword basis which implies that no code word may be a prex to

another Human devised a coding scheme to meet these requirements and

lead to average codeword lengths lower than that provided by xedlength

codes Human coding provides an average codeword length L that is limited

by the zerothorder entropy of the source H

see Equation and H

L H

The procedure to generate the Human code is as follows

TABLE

Integers in the Range in Several Alphabets or Codes

English Portuguese Roman Decimal Binary Gray Octal Hex

Zero Zero

One UnUma I

Two DoisDuas II

Three Tres III

Four Quatro IV

Five Cinco V

Six Seis VI

Seven Sete VII

Eight Oito VIII

Nine Nove IX

Ten Dez X A

Eleven Onze XI B

Twelve Doze XII C

Thirteen Treze XIII D

Fourteen Catorze XIV E

Fifteen Quinze XV F

Sixteen Dezesseis XVI

Seventeen Dezessete XVII

Eighteen Dezoito XVIII

Nineteen Dezenove XIX

Twenty Vinte XX

Leading zeros have been removed in the decimal and hexadecimal Hex codes

but retained in the binary Gray and octal codes

Prepare a table listing the symbols gray levels in the source image

sorted in decreasing order of the probabilities of their occurrence

Combine the last two probabilities The list of probabilities now has

one less entry than before

Copy the reduced list over to a new column rearranging as necessary

such that the probabilities are in decreasing order

Repeat the procedure above until the list of probabilities is reduced to

only two entries

Assign the code digits and to the two entries in the nal column

of probabilities Note There are two possibilities of this assignment

that will lead to two dierent codes however their performance will be

identical

Working backwards through the columns of probabilities assign addi

tional bits of and to the two entries that resulted in the last com

pounded entry in the column

Repeat the procedure until the rst column of probabilities is reached

and all symbols have been assigned a code word

It should be noted that a Human code is optimal for only the given source

PDF a change in the source PDF would require the design of a dierent code

in order to be optimal A disadvantage of the Human code is the increasing

length of its code words especially for sources with several symbols The

method does not perform any decorrelation of the data and is limited in

average codeword length by the zerothorder entropy of the source

Example Figure shows a part of the image in Figure a

quantized to bpixel The gray levels in the image are in the range

and would require bpixel with straight binary coding The histogram of

the image is shown in Figure it is evident that some of the pixel values

occur with low probabilities

The procedure for accumulating the probabilities of occurrence of the source

symbols is illustrated in Figure The Human coding process is shown in

Figure Note that a dierent code with equivalent performance may be

generated by reversing the order of assignment of the code symbols and

at each step The average codeword length is bpixel which is slightly

above the zerothorder entropy of b of the image The advantage is

relatively small due to the fact that the source in the example uses only eight

symbols with bpixel and has a relatively wellspread histogram PDF

However simple representation of the data using ASCII coding would require

a minimum of bpixel the savings with reference to this requirement are

signicant Larger advantages may be gained by Human coding of sources

with more symbols and narrow PDFs

FIGURE

Top to bottom A part of the image in Figure a quantized to

bpixel shown as an image a D array and as a string of integers with

the graylevel values of every pixel The line breaks in the string format have

been included only for the sake of printing within the width of the page

FIGURE

Graylevel histogram of the image in Figure Zerothorder entropy H

B i o m e d i c a l I m a g e A n a l y s i s

FIGURE

Accumulation of probabilities Prob of occurrence of gray levels in the derivation of the Human code for the image in

Figure The probabilities of occurrence of the symbols have been rounded to two decimal places and add up to unity

I m a g e C o d i n g a n d D a t a C o m p r e s s i o n

FIGURE

Steps in the derivation of the Human code for the image in Figure Prob probabilities of occurrence of the gray

levels The binary words in bold italics are the Human code words at the various stages of their derivation See also

Figure

Interpixel correlation may be taken into account in Human coding by

considering combinations of pixels gray levels as symbols If we were to

consider pairs of gray levels in the example above with gray levels quantized to

bpixel we would have a total of possibilities see Table The

rstorder entropy of the image considering pairs of gray levels isH

an average codeword length close to this value may be expected if Human

coding is applied to pairs of gray levels

TABLE

Counts of Occurrence of Pairs of Pixels in the Image in Figure

Current pixel Next pixel in the same row

For example the pair occurs times in the image The last pixel in

each row was not paired with any pixel The rstorder entropy of the image

considering the probabilities of occurrence of pairs of graylevel values as in

Equation is H

b The zerothorder entropy is H

Although the performance of Human coding is limited when applied di

rectly to source symbols the method may be applied to decorrelated data with

signicant advantage due to the highly nonuniform or concentrated PDFs of

such data The performance of Human coding as a postencoder following

decorrelation methods is discussed in several sections to follow

Runlength coding

Images with high levels of correlation may be expected to contain strings of

repeated occurrences of the same gray level such strings are known as runs

Data compression may be achieved by coding such runs of gray levels For

example the rst three rows of the image in Figure may be represented

as follows

Row

In the code above each pair of values represents a run with the rst value

standing for the gray level and the second value giving the number of times

the value has occurred in the run The coding procedure is interrupted at the

end of each row to permit synchronization in case of errors in the reception

and decoding of run values Direct coding of the pixels in the

three rows considered at bpixel leads to a total code length of b In

the runlength code given above if we were to use three bits per gray level and

four bits per runlength value we get a total code length of b

The savings are small in this case due to the busy nature of the image

which represents an eye

Runlength coding is best suited for the compression of bilevel images where

long runs may be expected of the two symbols and Images with ne

details intricate texture and highresolution quantization with large numbers

of bits per pixel may not present long runs of the same gray level runlength

coding in such cases may lead to data expansion rather than compression

This is the case with the image in Figure past the third row

Runlength coding may be advantageously applied to bit planes of gray

level and color images The use of Gray coding see Table improves the

chances of long runs in the bit planes due to the feature that the Gray code

changes in only one bit from one numerical value to the next

Errors in run length could cause severe degradation of the reconstructed

image due to the loss of pixel position Synchronization at the end of each

row can avoid the carrying over of such errors beyond the aected row

Runs may also be dened over D areas However images with ne details

do not present such uniform areas with large numbers of occurrence to lend

much coding advantage

Arithmetic coding

Arithmetic coding is a family of codes that treat input symbols as mag

nitudes Shannon presented the basic idea of representing a string of

symbols as the sum of their scaled probabilities Most of the development

towards practical arithmetic coding has been due to Langdon and Rissa

nen The basic advantage of arithmetic coding over Human

coding is that it does not suer by the limitation that each symbol should

have a unique code word that is at least one bit in length

The mechanism of arithmetic coding is illustrated in Figure

The symbols of the source string are represented by their individual probabili

ties p

and cumulative probabilities sum of the probabilities of all symbols up

to but not including the current symbol P

At any given stage in coding

the source string is represented by a code point C

and an interval A

The

code point C

represents the cumulative probability of the current symbol

on a scale of interval size A

A new symbol being appended to the source

string is encoded by scaling the interval by the probability of the current

symbol as

and dening the new code point as

Decoding is performed by the reverse of the procedure described above The

interval A

is initialized to unity and the current code point is determined by

the range into which the nal code point C

nal

falls The scaling of the interval

and code point is performed as during encoding The encoding procedure

ensures that no future code point C

exceeds the current value C

Thus

a carry over to a given bit position in the binary representation of C

nal

occurs at most once during encoding This fact is made use of for incremental

coding and for using niteprecision arithmetic Finite precision is used by

employing a technique known as bit stung where if a series of ones

longer than the specied precision occurs in the binaryfraction representation

of C

a zero is inserted this ensures that further carries do not propagate into

the series of ones Witten et al provide an implementation of arithmetic

coding using integer arithmetic

Direct arithmetic coding of an image consists of an initial estimation of the

probabilities of the gray values in the image followed by rowwise arithmetic

coding of pixels Direct coding does not take into account the correlation

between adjacent pixels Arithmetic coding can be modied to make use of the

correlation between pixels to some extent by using conditional probabilities

of occurrence of gray levels

In a version of arithmetic coding known as Qcoding the individual

bit planes of an image are coded using probabilities conditioned on the sur

rounding bits in the same plane as the context A more ecient procedure is

to perform decorrelation of the pixels of the image separately and to use the

basic arithmetic coder as a postencoder on the decorrelated set of symbols

see for example Rabbani and Jones The performance of arithmetic

coding as a postencoder after the application of decorrelation methods is

discussed in several sections to follow

FIGURE

Arithmetic coding procedure The range A

is initialized to unity Each sym

bol is represented by its individual probability p

and cumulative probability

The string being encoded is represented by the code point C

on the cur

rent range A

The range is scaled down by the probability p

of the current

symbol and the process is repeated One symbol is reserved for the end of

the string Figure courtesy of GR Kuduvalli

Example The symbols used to the represent the image in Figure and

their individual as well as cumulative probabilities are listed in Table

The intervals representing the symbols are also provided in the table Let us

consider the rst three symbols f g in the fth row of the image The

procedure to derive the arithmetic code for this string of symbols is shown in

Figure

TABLE

The Symbols Used in the Image in

Figure Along with Their Individual

Probabilities of Occurrence p

Cumulative

Probabilities P

and Intervals Used in

Arithmetic Coding

Symbol l Count p

Interval

The initial code point is C

and the initial interval is A

When

the rst symbol is encountered the code point and interval are updated

as C

For the

next symbol we get C

and

With the third symbol appended to

the string we have C

and A

The code points have been given in decimal code to full precision as re

quired in this example the individual code probabilities have been rounded

to two decimal places In actual application the code points need to be rep

resented in binary code with nite precision The average codeword length

FIGURE

The arithmetic coding procedure applied to the string f g formed by the

rst three symbols in the fth row of the image in Figure See Table

for the related probabilities and intervals see also Figure All intervals

are shown mapped to the same physical length although their true values

decrease from the interval at the top of the gure to that at the bottom

The numerals in italics indicate the symbols gray levels in the

image The values in bold at the ends of each interval give the values of C

and C

at the corresponding stage of coding

per symbol is reduced by encoding long strings of symbols such as an entire

row of pixels in the given image

LempelZiv coding

Ziv and Lempel proposed a universal coding scheme for encoding symbols

from a discrete source when their probabilities of occurrence are not known

a priori The coding scheme consists of a rule for parsing strings of symbols

from the source into substrings or words and mapping the substrings into

uniquely decipherable code words of xed length Thus unlike the Human

code where codes of xed length are mapped into variablelength codes the

LempelZiv code maps codes of variable length corresponding to symbol

strings of variable length into codes of xed length

The LempelZiv coding scheme is illustrated in Figure The

coding procedure starts with a buer of length

n L

where L

is the maximum length of the input symbol strings being parsed

is the cardinality of the symbol source in the case of image coding the

number of possible gray levels and is chosen such that The

buer is initially lled with n L

zeros and the rst L

symbols from the

source The buer is then parsed for the string whose length l

is less than

but is the maximum of all such strings from to n L

and which

has an identical string in the buer starting at position n L

The code to

be mapped consists of the beginning position b

of this string in the buer

from position to nL

the length of the string l

and the last symbol

following the end of the string The total length of the code for a straight

binary representation is

l dlog

n L

log

where dxe is the smallest integer x After coding the string the buer is

advanced by l

number of symbols Ziv and Lempel showed that as the

total length of the input symbols tends to the average bit rate for coding

the string approaches that of an optimal code with complete knowledge of the

statistics of the source

The LempelZiv coding procedure may be viewed as a search through a

xedsize variablecontent dictionary for words that match the current string

A modication of this procedure known as LempelZivWelch LZW cod

ing consists of using a variablesized dictionary with every new string

encountered in the source string added to the dictionary The dictionary is

initialized to singlesymbol strings made up of the entire symbol set This

eliminates the need for including the symbol s

in the code words The LZW

string table has the prex property for every string of symbols in the table

its prex is also present in the table

The LempelZiv coding procedure At each iteration the buer is scanned

for strings of length l

for a match in the substring of length n L

within the buer The matched string location b

is encoded and transmitted

Figure courtesy of GR Kuduvalli

Kuduvalli implemented a slight variation of the LZW code in which

the rst symbol of the current string is appended as the last symbol of the

previously parsed string and the new string is added to the string table

With this method the decoded strings are generated in the same order as

the encoded strings The string table itself is addressed during the encoding

procedure as a linklist Each string contains the address of every other string

of which it is a prex Such a linklist is not necessary during decoding

because the addresses of the strings are directly available to the decoder

LZW coding may be applied directly for source coding or applied to decor

related data The following sections provide examples of application of the

LZW code

Example Let us consider again the image in Figure The image

has eight symbols in the range each of which will be an item in the

LZW table shown in Table the eight basic symbols may be represented

by their own code Index and Index represent two possibilities of coding

Consider the string f g which occurs ve times in the image In order

to exploit this feature we need to add the strings f g and f g to the

table we could use the codes for the former and or for the latter

represents the symbol being appended to the string represented by

the code The string f g present at the beginning of the fourth

row in the image may be represented as In this manner long strings of

symbols are encoded with short code words The code index in the dictionary

table is used to represent the symbol string A predened limit may be

applied to the length of the dictionary

TABLE

Development of the

LempelZivWelch LZW

Code Table for the Image

in Figure

String Index Index

Contour coding

Given that a digital image includes only a nite number of gray levels we

could expect strings of the same values to occur in some form of D contours or

patterns in the image The same expectation may be arrived at if we were

to consider the gray level to represent height an image may then be viewed

as a relief map with isointensity contours representing steps or plateaus as

in elevation maps of mountains Information related to all such contours

may then be used to encode the image Although the idea is appealing in

principle ne quantization could result in low probabilities of occurrence of

large contours with simple patterns

Each isointensity contour would require the encoding of the coordinates of

the starting point of the contour the associated gray level and the sequence

of steps movement needed to trace the contour A consistent rule is required

for repeatable tracing of contours The leftmostlooking rule for tracing

a contour is as follows

Select a pixel that is not already a member of a contour

Look at the pixel to the left relative to the direction of entry to the current

pixel on the contour

If the pixel has the same gray level as the current pixel move to the pixel

and encode the type of movement

If not look at the pixel straight ahead

If the pixel has the same gray level as the current pixel move to the pixel

and encode the type of movement

If not look at the pixel to the right

If the pixel has the same gray level as the current pixel move to the pixel

and encode the type of movement

If not move to the pixel behind the current pixel

Repeat the procedure will trace a closed loop and return to the starting

point

Repeat the procedure until every pixel in the image belongs to a contour

The movements allowed are only to the left straight ahead to the right

and back the four possibilities may be encoded using the Freeman chain

code illustrated in Figure

Example The image in Figure is shown in Figure along with the

tracings of three isointensity contours With reference to the contour with

the value at the topleft corner of the image observe that several spatially

connected pixels with the same value and lying within the contour have not

been included in the contour these pixels require additional contours The

contourcoding procedure needs to be applied repeatedly until every pixel in

the image belongs to a contour The encoding of short strings or isolated

occurrences of pixel values could require several coding steps and lead to

increased code length

The data required to represent the contour with the gray level starting

with the pixel in the rst row and rst column would be as follows

Initial point Coordinates Gray level

Freeman code

Contour code requirement four bits for each coordinate three bits for the

pixel value two bits per Freeman code Total b

Direct binary code requirement for the pixels on the contour at bpixel

b at bpixel b

The data required to represent the contour with the gray level starting

with the pixel in the fth row and eighth column would be as follows

Initial point Coordinates Gray level

Freeman code

Contour code requirement Total b Direct binary code requirement for

the eight pixels on the contour at bpixel b at bpixel b

The data required to represent the contour with the gray level starting

with the pixel in the ninth row and rst column would be as follows

Initial point Coordinates Gray level

Freeman code

Contour code requirement b Direct binary code requirement for the

pixels on the contour at bpixel b at bpixel b

It is evident that higher advantages may be gained if a number of long

contours with simple patterns are present in the image

Application Source Coding of Digitized Mammo

grams

Kuduvalli et al applied several coding techniques for the com

pression of digitized mammograms In their work lm mammograms were

scanned using an Eikonix digitizing camera with a linear CCD array to

provide a horizontal scan line of pixels A vertical array size of

pixels was achieved by stepping the array over scan lines A Gordon

Instruments Plannar light box was used to illuminate the Xray lm be

ing digitized The gain and oset variations between the CCD elements were

corrected for in the camera and the data were transferred to the host com

puter over an IEEE bus Corrections were applied for the lightintensity

variations in the Plannar light source used to illuminate the lms and

the digitized image was stored in a Megascan FDP frame buer with a

capacity of b Several b and b transformations

were developed to test the dynamic range light intensity and focus settings

The eective dynamic range of an imaging system depends upon the scaling

factors used for correcting lightintensity variations and the SNR of the imag

ing system Kuduvalli et al analyzed the intensity proles of the Plannar

light box and observed that a scaling factor of about was required

FIGURE

Contour coding applied to the image in Figure Three contours are

shown for the sake of illustration of the procedure The pixels included in

the contours are shown in bold italics The initial point of each contour is

underlined Doubleheaded arrows represent two separate moves in the two

directions of the arrows

to correct the values of the pixels at the edges of the image Furthermore

the local standard deviation of the intensity levels measured by the camera

with respect to a movingaverage window of counts was estimated to be

about counts The eective noise level at the edge of the imaging eld

was estimated at counts For these reasons it was concluded that two of

the leastsignicant bits in the b data would only contribute to noise and

aect the performance of data compression algorithms In addition by

scanning a standard calibrated grayscale step pattern the eective dynamic

range of the digitizer was observed to be about OD In consideration of

all of the above factors it was determined that truncating the b pixel val

ues to b values would be adequate for representing the digitized image

This procedure reduced the eective noise level by a factor of four to about

counts

Kuduvalli et al also estimated the MTF of the digitization system from

measurements of the ESF see Sections and with a view to demon

strate the resolving capability of the system to capture submillimeter details

on Xray lms such as microcalcications on mammograms The normalized

value of the MTF at onehalf of the sampling frequency was estimated to be

which is considered to be adequate for resolving objects and details at

the same frequency which is the highest frequency component retained in the

digitized image

The average number of bits per pixel obtained for ten Xray images using the

Human arithmetic and LZW coding techniques are listed in Table the

zerothorder entropy values of the images are also listed The high values of the

zerothorder entropy indicate limits on the performance of the Human coding

technique The arithmetic coding method has given bit rates comparable

with those provided by the Human code but at considerably higher level of

complexity Figure shows plots of the average bit rate as a function of

the buer length in LZW coding for four of the ten images listed in Table

The maximum length of the symbol strings scanned was xed at L

The LZW code provided the best compression rates among the three methods

considered to the extent of about of the initial number of bits per pixel

in the images The average bit rate provided by LZW coding is well below the

zerothorder entropy values of the images indicating that ecient encoding

of strings of pixel values can exploit the redundancy and correlation present

in the data without performing explicit decorrelation

The Need for Decorrelation

The results of direct source encoding discussed in Section indicate the

limitations of direct encoding of the symbols generated by an image source

TABLE

Average Number of Bits Per Pixel with Direct Source Coding for Data

Compression of Digitized Mammograms and Chest Xray

Images

Image Type Size pixels Entropy Human Arith LZW

Mammo

Chest

Mammo

Chest

Average

The entropy listed is the zerothorder entropy Pixel values in the original

images were quantized at bpixel See also Table Note Arith

Arithmetic coding LZW LempelZivWelch coding Mammo Mam

mogram Reproduced with permission from GR Kuduvalli and RM Ran

gayyan Performance analysis of reversible image compression techniques for

highresolution digital teleradiology IEEE Transactions on Medical Imaging

IEEE

FIGURE

Average bit rate as a function of the buer length using LempelZivWelch

coding for four of the ten images number and listed in Table

The abscissa indicates the value of B with the buer length given by n

The maximum length of the symbol strings scanned was xed at L

produced with permission from GR Kuduvalli and RM Rangayyan Perfor

mance analysis of reversible image compression techniques for highresolution

digital teleradiology IEEE Transactions on Medical Imaging

IEEE

Although some of the methods described such as the LempelZiv and run

length coding methods have the potential to exploit the redundancy present

in images their eciency in this task is limited

The term decorrelation indicates a procedure that can remove or reduce

the redundancy or correlation present between the elements of a data stream

such as the pixels in an image The most commonly used decorrelation tech

niques are the following

dierentiation which can remove the commonality present between ad

jacent elements

transformation to another domain where the energy of the image is con

ned to a narrow range such as the Fourier KarhunenLo eve discrete

cosine or WalshHadamard orthogonal transform domains

modelbased prediction where the error of prediction would have re

duced information content and

interpolation where a subsampled image is transmitted the pixels in

between the preceding data are obtained by interpolation and the error

of interpolation which has reduced information content is transmitted

Observe that the decorrelated data transform coecients prediction error

etc need to be encoded and transmitted the techniques described in Sec

tion for direct source encoding may also be applied to decorrelated data

In addition further information regarding initial values and the procedures

for the management of the transform coecients or the model parameters will

also have to be sent to facilitate complete reconstruction of the original image

The advantages of decorrelating image data by dierentiation are demon

strated by the following simple example The use of transforms for image

data compression is discussed in Section Interpolative coding is brie!y

described in Section Methods for predictionbased data compression are

described in Section Techniques based upon dierent scanning strategies

to improve the performance of decorrelation by dierentiation are discussed in

Sections and Strategies for combining several decorrelation steps

are discussed in Section

Example Consider the image in Figure a the histogram of the

image is shown in part b of the gure The image has a good spread of

gray levels over its spatial extent and the histogram while not uniform does

exhibit a good spread over the dynamic range of the image The zerothorder

entropy at b is close to the maximum possible value of b for the image

with gray levels These characteristics suggest limited potential for direct

encoding methods

The image in Figure a was subjected to a simple rstorder partial

dierentiation procedure given by

mn fmn fm n

The result shown in Figure a has an extremely limited range of de

tails the histogram of the image shown in part b of the gure indicates

that although the image has more gray levels than the original image most of

the gray levels occur with negligible probability The concentrated histogram

leads to a lower value of entropy at b Observe that the histogram

of the dierence image is close to a Laplacian PDF see Section and

Figure The simple operation of dierentiation has reduced the entropy

of the image by about The reduced entropy suggests that the coding

requirement may be reduced signicantly Observe that the additional infor

mation required to recover the original image from its derivative as above is

just the rst row of pixels in the original image Data compression techniques

based upon dierentiation are referred to as dierential pulse code modulation

DPCM techniques DPCM techniques vary in terms of the reference value

used for subtraction in the dierentiation process The reference value may

be derived as a combination of a few neighboring pixels in which case the

method approaches linear prediction in concept

Transform Coding

The main premise of transformdomain coding of images is that when orthog

onal transforms are used the related coecients represent elements that are

mutually uncorrelated See Section for the basics of orthogonal trans

forms Furthermore because most natural images have limitations on the

rate of change of their elemental values that is they are generally smooth

their energy is conned to a narrow lowfrequency range in the transform do

main These properties lead to two characteristics of orthogonal transforms

that are of relevance and importance in data compression

orthogonal transforms perform decorrelation and

orthogonal transforms compress the energy of the given image into a

narrow region

The second property listed above is commonly referred to as energy com

paction

Example The logmagnitude Fourier spectrum of the image in Figure

a is shown in Figure a It is evident that most of the energy

of the image is concentrated in a small number of DFT coecients around

the origin in the D Fourier plane at the center of the spectrum displayed

In order to demonstrate the energycompacting nature of the DFT the cu

mulative percentage of the total energy of the image present at the

coordinate of the DFT and contained within concentric square regions of

FIGURE

a A test image of size pixels with gray levels b Graylevel

histogram of the test image Dynamic range Zerothorder entropy

FIGURE

a Result of dierentiation of the test image in Figure a b Gray

level histogram of the image in a Dynamic range Zerothorder

entropy b

halfwidth DFT coecients were computed The result is plot

ted in Figure b which shows that of the energy of the image is

present in the DC component and of the energy is contained within the

central DFT components around the DC point only of the energy

lies in the highfrequency region beyond the central region of the

DFT array Regardless of the small fraction of the total energy

present at higher frequencies it should be observed that highfrequency com

ponents bear important information related to the edges and sharpness of the

image see Sections and

The DFT is the most commonly used orthogonal transform in the analysis

of systems signals and images However due to the complex nature of the

basis functions the DFT has high computational requirements In spite of

its symmetry the DFT could lead to increased direct coding requirements

due to the need for large numbers of bits for quantization of the transform

coecients Regardless the discrete nature of most images at the outset could

be used to advantage in lossless recovery from transform coecients that have

been quantized to low levels of accuracy see Section

We have already studied the DFT Sections and and the WHT

Section The WHT has a major computational advantage due to the

fact that its basis functions are composed of only and and has been ad

vantageously applied in data compression In the following sections we shall

study two other transforms that are popular and relevant to data compres

sion the discrete cosine transform DCT and the KarhunenLo eve transform

KLT

The discrete cosine transform

The DCT is a modication of the DFT that overcomes the eects of discon

tinuities at the edges of the latter and is dened as

F k l

ak l

fmn cos

cos

for k N and l N where

ak l

if k l

otherwise

The inverse transformation is given by

F k l

ak lF k l cos

cos

for m N and n N The basis vectors

of the DCT closely approximate the eigenvectors of a Toeplitz matrix see

FIGURE

a Logmagnitude Fourier spectrum of the image in Figure a com

puted over a DFT array b Distribution of the total image energy

The values represent the cumulative percentage of the energy of the im

age present at the or DC position contained within square boxes of

halfwidth pixels centered in the DFT array and in the entire

DFT array The numbers of DFT coecients corresponding to the

regions are and

Section whose elements can be expressed by increasing powers of a

constant The ACF matrices of most natural images can be closely

modeled by such a matrix An N N DCT may be computed from the

results of a N N DFT thus FFT algorithms may be used for ecient

computation of the DCT

The KarhunenLoeve transform

The KLT also known as the principalcomponent transform the Hotelling

transform or the eigenvector transform is a datadependent transform

based upon the statistical properties of the given image The image is treated

as a vector f that is a realization of an imagegenerating stochastic process

If the image is of size M N the vector is of size P MN see Section

The image f may be represented without error by a deterministic linear

transformation of the form

f A g

A A

where jAj and A

are row vectors that make up the P P matrix

A The matrix A needs to be formulated such that the vector g leads to an

ecient representation of the original image f

The matrix A may be considered to be made up of P linearly indepen

dent row vectors that span the P dimensional space containing f Let A be

orthonormal that is

m n

It follows that

A I or A

Then the row vectors of A may be considered to form the set of orthonormal

basis vectors of a linear transformation This formulation leads also to the

inverse relationship

g A

In the procedure described above each component of g contributes to the

representation of f Given the formulation of A as a reversible linear trans

formation g provides a complete lossless representation of f if all of its

P MN elements are made available

Suppose that in the interest of ecient representation of images via the

extraction of the most signicant information contained we wish to use only

Q P components of g The omitted components of g may be replaced

with other values b

m Q P Then we have an approximate

representation of f given as

The error in the approximate representation as above is

The MSE is given by

The last step above follows from the orthonormality of A

Taking the derivative of the MSE with respect to b

and setting the result

to zero we get

The optimal MMSE choice for b

is therefore given by

Ef m Q P

that is the omitted components are replaced by their means The MMSE is

given by

min

f f f f

where

is the covariance matrix of f

Now if the basis vectors A

are selected as the eigenvectors of

that is

and

because A

where

are the corresponding eigenvalues then we

have

min

Therefore the MSE may be minimized by ordering the eigenvectors the rows

of A such that the corresponding eigenvalues are arranged in decreasing

order that is

Then if a component g

of g is replaced

by b

the MSE increases by

By replacing the components of g

corresponding to the eigenvalues at the lower end of the list the MSE is kept

at its lowestpossible value for a chosen number of components Q

From the above formulation and properties it follows that the components

of g are mutually uncorrelated

where is a diagonal matrix with the eigenvalues

placed along its diago

nal Because the eigenvalues

are equal to the variances of g

a selection

of the larger eigenvalues implies the selection of the transform components

with the higher variance or information content across the ensemble of the

images considered

The KLT has major applications in principalcomponent analysis PCA

image coding data compression and feature extraction for pattern classi

cation Diculties could exist in the computation of the eigenvectors and

eigenvalues of the large covariance matrices of even reasonably sized images

It should be noted that a KLT is optimal only for the images represented

by the statistical parameters used to derive the transformation New trans

formations will need to be derived if changes occur in the statistics of the

imagegenerating process being considered or if images of dierent statistical

characteristics need to be analyzed

Because the KLT is a datadependent transform the transformation vectors

the matrix A need to be transmitted however if a large number of images

generated by the same underlying process are to be transmitted the same

optimal transform is applicable and the transformation vectors need to be

transmitted only once Note that the error between the original image and

the image reconstituted from the KLT components needs to be transmitted

in order to facilitate lossless recovery of the image

See Section for a discussion on the application of the KLT for the

selection of the principal components in multiscale directional ltering

Encoding of transform coecients

Regardless of the transform used such as the DFT DCT or KLT the trans

form coecients form a set of continuous random variables and have to be

quantized for encoding This introduces quantization errors in the transform

coecients that are transmitted and hence errors arise in the image recon

structed from the transformcoded image In the following paragraphs a rela

tionship is derived between the quantization error in the transform domain and

the error in the reconstructed image Kuduvalli and Rangayyan

used such a relationship to develop a method for errorfree transform coding

of images

Consider the general D linear transformation with the forward and inverse

transform kernels consisting of orthogonal basis functions amn k l and

bmn k l respectively such that the forward and inverse transforms are

given by

F k l

fmn amn k l

k l N and

fmn

F k l bmn k l

mn N See Section Now let the transform coecient

F k l be quantized to

F k l with a quantization error q

k l such that

F k l

F k l q

k l

The reconstructed image from the quantized transform coecients is given by

fmn

F k l bmn k l

The error in the reconstructed image is

mn fmn

fmn

F k l

F k l bmn k l

k l bmn k l

The sum of the squared errors in the reconstructed image is

mnj

k l bmn k l

bmn k

k l q

bmn k l b

mn k

k lj

where the last line follows from the orthogonality of the basis functions bmn

k l and Q

is the sum of the squared quantization errors in the transform

domain This result is related to Parseval s theorem in D see Equation

Applying the expectation operator to the rst and the last expressions above

we get

Ejq

mnj

Ejq

k lj

k l Q

where the symbol indicates the expected average values of the correspond

ing variables and Q

is the expected total squared error of quantization in

either the image domain or the transform domain

It is possible to derive a condition for the minimum average number of bits

required for encoding the transform coecients for a given total distortion in

the image domain Let us assume that the transform coecients are normally

distributed If the variance of the transform coecient F k l is

k l the

average number of bits required to encode the coecient F k l with the MSE

k l is given by its ratedistortion function

Rk l

log

k l

The overall average number of bits required to encode the transform coe

cients with a total squared error Q

Rk l

log

k l

We now need to minimize R

subject to the condition given by Equa

tion Using the method of Lagrange s multiplier the minimum occurs

when

log

k l

k l N where is the Lagrange multiplier It follows that

ln q

k l

k l N where q

is the average MSE which is a constant for

all of the transform coecients Thus the average number of bits required

to encode the transform coecients R

is minimum when the total squared

error is equally distributed among all of the transform coecients

Maximum error limited encoding of transform coecients Kudu

valli and Rangayyan derived the following condition for en

coding transform coecients subject to a maximum error limit Consider a

uniform quantizer with a quantization step size S for encoding the transform

coecients such that the maximum quantization error is limited to

may be assumed that the quantization error is uniformly distributed over the

set of transform coecients in the range

see Figure Then

the average squared error in the transform domain is

From the result in Equation it is seen that the errors in the recon

structed image will also have a variance equal to q

We now wish to estimate

the fraction of the total number of pixels in the reconstructed image that are

in error by more than S This is given by the area under the tail of the PDF

of the reconstruction error shown in Figure The worst case occurs

when the entropy of the reconstruction errors is at its maximum under the

constraint that the variance of the reconstruction errors is bounded by q

this occurs when the error is normally distributed Therefore the upper

bound on the estimated fraction of the pixels in error by more than S is

exp

dx erfc

FIGURE

Schematic PDFs of the transformcoecient quantization error uniform PDF

solid line and the image reconstruction error Gaussian PDF dashed line

The gure represents the case with the quantization step size S

where erfcx is the error function integral dened as

erfcx

exp

Thus only a negligible number of pixels in the reconstructed image will

be in error by more than the quantization step size S Conversely if the

maximum error is desired to be S a quantization step size of S could be

used to encode the transform coecients with only a negligibly small number

of the reconstructed pixels exceeding the error limit The pixels in error by

more than the specied maximum could be encoded separately with a small

overhead When the maximum allowed error is errorfree reconstruction

of the image is possible by simply rounding o the reconstructed values to

integers

Variable length encoding and bit allocation The lower bound on

the average number of bits required for encoding normally distributed trans

form coecients F k l with the MSE q

k l is given by Equation

Goodnessoft studies of PDFs of transform coecients have shown

that transform coecients tend to follow the Laplacian PDF The PDF of a

transform coecient F k l may be modeled as

pF k l

k l

expk l jF k lj

where k l

k l is the constant parameter of the Laplacian PDF

A shift encoder could be used to encode the transform coecients such that

the maximum quantization error is S The shiftencoding procedure is

shown in Figure In a shift encoder k l levels are nominally allo

cated to encode a transform coecient F k l covering the range fk l

gS fk lgS with the codes k l The code k l

indicates that the coecient is out of the range fk l gS fk l

gS For the outofrange coecients an additional k l levels are allo

cated to cover the ranges fk lgSk lS and k lS fk l

gS The process is repeated with the allocation of additional levels until

the actual value of the transform coecient to be encoded is reached If the

code value is represented by a simple binary code at each level the average

number of bits required to encode the transform coecient F k l is given

Rk l

log

k l

expk l k l S

and the nominal number of bits allocated to encode the transform coecient

bk l dlog

k le

It is now required to nd the bk l that minimizes the average number of

bits Rk l required to encode F k l This can be done by using a nonlinear

optimization technique such as the NewtonRaphson method However be

cause only integral values of bk l need to be searched it is computationally

less expensive to search the space of bk l for the corresponding

minimum value of Rk l which is the range of the nominal number of bits

allocated to encode the transform coecient F k l

FIGURE

Schematic representation of shift coding of transform coecients with a Lapla

cian PDF With reference to the discussion in the text the gure represents

k lS and k l

This allocation requires an estimate of the variance

k l of the transform

coecients or a model of the energycompacting property of the transform

used Most of the linear orthogonal transforms used in practice result in the

concentration of the variance in the lowerorder transform coecients The

variance of the transform coecients may be modeled as

k l F exp

where F is the lowestorder transform coecient and and are the

constants of the model For most transforms except the KLT F is the

average value or a scaled version of the average value of the image pixels The

parameters and may be estimated using a leastsquares t to the rst few

transform coecients of the image

In an alternative coding procedure a xed total number of bits may be

allocated for encoding the transform coecients and the dierence image

encoded by a lossless encoding method In such a procedure no attempt is

made to allocate additional bits for transform coecients that result in errors

that fall out of the quantization range The total number of bits allocated to

encode the transform coecients is varied until the total average bit rate is at

its minimum Using such a procedure Cox et al found an optimal com

bination of bit rates between the error image and the transform coecients

Wang and Goldberg used a method of requantization of the quantization

errors in addition to encoding the error image they too observed a minimum

total average bit rate after a number of iterations of requantization Exper

iments conducted by Kuduvalli and Rangayyan showed that

the lowest bit rates obtained by such methods for reversible compression can

also be obtained by allocating additional bits to quantize the outofrange

transform coecients as described earlier This is due to the fact that a large

quantization error in the transform domain while needing only a few addi

tional bits for encoding will get redistributed over a large number of pixels

in the image domain thereby increasing the entropy of the error image

The large sizes of image arrays used for highresolution representation of

medical images preclude the use of fullframe transforms Partitioning an im

age into blocks not only leads to computational advantages but also permits

adaptation to the changing statistics of the image In the coding procedure

used by Kuduvalli and Rangayyan the images listed in Ta

ble were partitioned into blocks of size pixels The model

parameters and in Equation were computed for each block by us

ing a leastsquares t to the corresponding set of transform coecients The

parameters were stored or transmitted along with the encoded transform co

ecients in order to allow the decoder to reconstruct the model and the bit

allocation table Blocks at the boundaries of the images that were not squares

were encoded with D DPCM and the Human code

Figure shows the average bit rate for one of the images listed in

Table as a function of the maximum allowed error using four transforms

The KLT with the ACF estimated from the image and the the DCT show

the best performance among the four transforms The performance of the

KLT is only slightly superior to and in some cases slightly worse than that

of the DCT this is to be expected because of the general nonstationarity of

medical images and due to the problems associated with the estimation of

the ACF matrix from a nite image

Figure shows the average bit rate obtained by using the DCT as a

function of the maximum allowed error for four of the ten images listed in Ta

ble When the maximum allowed error is errorfree reconstruction

of the original image is possible otherwise the compression is irreversible

The average bit rate for errorfree reconstruction is seen to be in the range of

bpixel for the images considered down from bpixel in their original

format

FIGURE

Average bit rate as a function of the maximum allowed error using four trans

forms KLT DCT DFT and WHT with the rst image listed in Table

A block size of pixels was used for each transform The compression

is lossless if the maximum allowed error is otherwise it is irreversible or

lossy See also Table Reproduced with permission from GR Kuduvalli

and RM Rangayyan Performance analysis of reversible image compression

techniques for highresolution digital teleradiology IEEE Transactions on

Medical Imaging

IEEE

FIGURE

Average bit rate as a function of the maximum allowed error using the DCT

for the rst four images listed in Table A block size of pixels

was used The compression is lossless if the maximum allowed error is

otherwise it is irreversible or lossy See also Table Reproduced

with permission from GR Kuduvalli and RM Rangayyan Performance

analysis of reversible image compression techniques for highresolution digital

teleradiology IEEE Transactions on Medical Imaging

IEEE

Interpolative Coding

Interpolative coding consists of encoding a subsampled image using a re

versible compression technique deriving the values of the remaining pixels

via interpolation with respect to their neighboring pixels that have already

been processed and then encoding the dierence between the actual pixels

and the interpolated pixels in successive stages using discrete symbol coding

techniques This technique is also referred to as hierarchical interpolation

HINT and is illustrated in Figure In the image shown in

the gure the pixels marked correspond to the original image decimated

by a factor of The decimated image could be encoded using any coding

technique The pixels marked are estimated from those marked by

bilinear interpolation and rounded to ensure reversibility This completes one

iteration of interpolation Next the pixels marked are interpolated from

the pixels marked and and the process is repeated to interpolate the

pixels marked and The dierences between the actual pixel values

marked and the corresponding interpolated values form a discrete

symbol set with a small dynamic range and may be encoded eciently using

Human arithmetic or LZW coding

The illustration in Figure corresponds to interpolation of order four

higherorder interpolation may also be used In general interpolative coding

of order

where P is an integer involves P iterations of interpolation

In the work of Kuduvalli and Rangayyan the digitized radio

graphic images listed in Table were partitioned into blocks of size

pixels for interpolative coding The initial subsampled images were decorre

lated using D DPCM of order the dierence data were compressed

by Human coding It was observed that the dierences between the inter

polated and actual pixel values could be modeled by Laplacian PDFs see

Figure The variance of the interpolation errors was seen to decrease

with increasing resolution because pixels are correlated more to their immedi

ate neighbors than to pixels farther away The interpolation errors at dierent

iterations were modeled by Laplacian PDFs with the variance equal to the cor

responding meansquared interpolation errors The PDFs were then used in

compressing the interpolation errors via arithmetic or Human coding LZW

coding does not need modeling of the error distribution but performed con

siderably worse than the Human and arithmetic codes Figure shows

the average bit rate for eight images as a function of the order of interpo

lation using Human coding as the postencoding technique It is observed

that increasing the order of interpolation has only a small eect on the overall

average bit rate

For examples on the performance of HINT see Tables

and

FIGURE

Stages of interpolative coding The pixels marked are coded and trans

mitted rst Next the pixels marked are estimated from those marked

and the dierences are transmitted The procedure continues iteratively

with the pixels marked and Reproduced with permission from

GR Kuduvalli and RM Rangayyan Performance analysis of reversible im

age compression techniques for highresolution digital teleradiology IEEE

Transactions on Medical Imaging

IEEE

FIGURE

Results of compression of eight of the images listed in Table by inter

polative and Human coding Order zero corresponds to D DPCM

coding Reproduced with permission from GR Kuduvalli and RM Ran

gayyan Performance analysis of reversible image compression techniques for

highresolution digital teleradiology IEEE Transactions on Medical Imaging

IEEE

Predictive Coding

Samples of reallife signals and images bear a high degree of correlation espe

cially over small intervals of time or space The correlation between sam

ples may also be viewed as statistical redundancy An outcome of these

characteristics is that a sample of a temporal signal may be predicted from

a small number of its preceding samples When such prediction is per

formed in a linear manner we have a linear prediction LP model given

ap fn p Gdn

where fn is the signal being modeled

fn is the predicted value of fn

ap p P are the coecients of the LP model and P is the order

of the model The signal fn is considered to be the output of a linear

system or lter with dn as the input or driving signal G is the gain of

the system Because the prediction model uses only the past values of the

output signal fn it is known as an autoregressive AR model The need for

causality in physically realizable lters and signal processing systems dictates

the requirement to use only the past samples of f and the present value of

d in predicting the current value fn If the past values of d are also used

the model will include a movingaverage or MA component

The model represented in Equation indicates that given the initial

set of P values of the signal f and the input or driving signal d any future

value of f may be approximately computed with a knowledge of the set

of coecients ap Therefore the model coecients ap p P

represent the signalgenerating process The model coecients may be used

to predict the values of f or to analyze the signalgenerating process Several

methods exist to derive the LP model coecients for a given signal subject

to certain conditions on the error of prediction

In the context of image data compression we have a few considerations that

dier from the temporal signal application described above In most cases the

input or driving signal d is not known the omission of the related component

in the model represented in Equation will cause only a small change in

the error of prediction Furthermore causality is not a matter of concern in

image processing however a certain sequence of accessing or processing of

the pixels in the given image needs to be dened which could imply past

and future samples of the image see Figure In the context of image

processing the samples used to predict the current pixel could be labeled as

the ROS of the model Then we could express the basic D LP model as

fmn

X X

pqROS

ap q fm p n q

In the application to image coding the ROS needs to be dened such that

in the decoding process only those pixels that have already been decoded are

included in the ROS for the current pixel being processed

The error of prediction is given by

emn fmn

fmn

and the MSE between the original image and the predicted image is given by

The coecients ap q need to be chosen or derived so as to minimize the MSE

between the original image and the predicted image Several approaches are

available for the derivation of optimal predictor coecients

Exact reconstruction of the image requires that in addition to the initial

conditions and the model coecients the error of prediction be also trans

mitted and made available to the decoder The advantage of LPbased image

compression lies in the fact that the error image tends to have a more concen

trated PDF than the original image close to a Laplacian PDF in most cases

see Figure b which lends well to ecient data compression

In the simplest model of prediction the current pixel fmn may be mod

eled as being equal to the preceding pixel fmn or fm n let

fmn fm n

Then the error of prediction is given by

emn fmn fm n

which represents simple DPCM Note Any data coding method based upon

the dierence between a data sample and a predicted value of the same using

any scheme is referred to as DPCM hence all LPbased methods fall under

the category of DPCM The dierentiation procedure given by Equation

and illustrated in Figure is equivalent to LP with

fmn fm n

as above Several simple combinations of the immediate neighbors of the

current pixel may also be used in the prediction model see Section

Ecient modeling requires the use of the optimal model order ROS size

and the derivation of the optimal coecients subject to conditions related to

the minimization of the MSE Several methods for the derivation of the model

coecients are described in the following sections

Twodimensional linear prediction

The error of prediction in the D LP model is given by

emn fmn

fmn

X X

pqROS

ap q fm p n q

The squared error is given by

mn f

X X

pqROS

ap q fmn fm p n q

X X

pqROS

X X

rsROS

ap q ar s fm p n q fm r n s

Applying the statistical expectation operator we get

X X

pqROS

ap q

p q

X X

pqROS

X X

rsROS

ap q ar s

r p s q

where

p q is the ACF of f and the imagegenerating process is assumed

to be widesense stationary

The coecients that minimize the MSE may be derived by setting to zero

the derivative of

with respect to ap q p q ROS which leads to

r s

X X

pqROS

ap q

r p s q r s ROS

Using this result in Equation we get

X X

pqROS

ap q

p q

Combining Equation and we get

r s

X X

pqROS

ap q

r p s q

r s ROS

r s

The equations represented by the expressions above are known as the D

normal or YuleWalker equations and may be solved to derive the prediction

coecients Because the method uses the ACF of the image to derive the pre

diction coecients it is known as the autocorrelation method The ACF

may be estimated from the given nite image fmn m M

n N as

p q

fmn fm p n q

The prediction coecients may also be estimated by using leastsquares

methods to minimize the prediction error averaged over the entire image

indicated as mn IMG as follows

X X

mnIMG

X X

mnIMG

X X

pqROS

ap q fmn fm p n q

X X

pqROS

X X

rsROS

ap q ar s fm p n q fm r n s

X X

pqROS

ap q

p q

X X

pqROS

X X

rsROS

ap q ar s

p q r s

Here

is the covariance of the image f dened as

p q r s

X X

mnIMG

fm p n q fm r n s

The coecients that minimize the averaged error may be derived by setting

to zero the derivative of

with respect to ap q which leads to

r s

X X

pqROS

ap q

p q r s

It follows that

X X

pqROS

ap q

p q

The D normal equations for this condition are given by

r s

X X

pqROS

ap q

p q r s

r s ROS

r s

which may be solved to obtain the prediction coecients Because the covari

ance of the image is used to derive the prediction coecients this method is

known as the covariance method

Now if the region IMG is dened so as to span the entire image array

of size M N and the image is assumed to be zero outside the range m

M n N we have

p q r s

fm p n q fm r n s

r p s q

where

is an estimate of the ACF of the image f Then the covariance

method yields results that are identical to the results given by the autocorre

lation method

Equation may be expressed in matrix form as

where for the case of a QP ROS of size P

the matrices

a and

are of size P

and

respectively The extended ACF matrix

is given by

where the subscript f has been dropped from the entries within the matrix

for the sake of brevity The matrices composed by the prediction coecients

and the error are given by

a P

and

The matrix

is ToeplitzblockToeplitz in nature ecient algorithms are

available for the inversion of such matrices

The methods described above to compute the prediction coecients assume

the image to be stationary over the entire frame available In practice most

images are nonstationary and may be assumed to be locally stationary only

over relatively small segments or ROIs In order to maintain the optimality

of the model over the entire image and in order to maintain the error of

prediction at low levels we could follow one of two procedures

partition the image into small blocks over which stationarity may be

assumed and compute the prediction coecients independently for each

block or

adapt the model to the changing statistics of the image

Encoding the prediction error In order to facilitate errorfree recon

struction of the image the prediction error has to be transmitted and made

available at the decoder in addition to the prediction coecients and the ini

tial conditions For quantized original pixel values the prediction error may

be rounded o to integers that may be encoded without error using a source

coding technique such as the Human code The prediction error has been

observed to possess a Laplacian PDF which lends well to ecient

compression by the Human code

Results of application to medical images The average bit rates ob

tained by the application of the autocorrelation method of computing the

prediction coecients using blocks of size pixels to six of the im

ages listed in Table are shown in Figure for various model orders

For the images used comparable performance was obtained using NSHP and

QP ROSs of similar extent Good compression performance was obtained

with average bit rates in the range bpixel for most of the images

with the original pixel values at bpixel using QP ROSs of size or

See Section for a method for the detection of calcications based

upon the error of prediction

Multichannel linear prediction

The dierence between LP in D and D may be bridged by multichannel

LP where a certain number of rows of the given image may be viewed as a

collection of multichannel signals Kudu

valli and Rangayyan proposed the following procedures for the

application of multichannel LP to predictive coding and compression of D

images

Consider a multichannel signal with P

channels with the channels

indexed as q P

and the individual signals labeled as fm

m N The collection of the signal s values at a position m

given by f

m q P

may be viewed as a multichannel signal or

vector or a matrix of size P

see Figures and If we

were to use a multichannel linear predictor of order P

we could predict the

vector fm as a linear combination of the vectors fm p p P

ap fm p

where ap p P

are multichannel LP coecient matrices each of

size P

FIGURE

Results of compression of six of the images listed in Table by D LP

autocorrelation method and Human coding The method was applied on

a blockbyblock basis using blocks of size pixels In the case of

modeling using the NSHP ROS a model order of indicates a

ROS see Figure indicates a ROS etc The orders of

models using the QP ROS are indicated by integers indicates a ROS

indicates a ROS etc Reproduced with permission from GR Kuduvalli

and RM Rangayyan Performance analysis of reversible image compression

techniques for highresolution digital teleradiology IEEE Transactions on

Medical Imaging

IEEE

FIGURE

Multichannel linear prediction Each row of the image is viewed as a channel

or component of a multichannel signal or vector The column index of the

image may be considered to be equivalent to a temporal index

The indices shown correspond to Equations and See also

Figure

The error of prediction is given by

em fm

ap fm p

The covariance matrix of the error of prediction is given by

Eem e

For optimal prediction we need to derive the prediction coecient matrices

ap that minimize the trace of the covariance matrix of the error of prediction

From Equation we can write

em e

m fm f

ap fm p f

fm f

m q a

ap fm p f

m q a

Applying the statistical expectation operator and assuming widesense sta

tionarity of the multichannel signalgenerating process we get

FIGURE

Multichannel LP applied to a D image See also Figure

Reproduced with permission from GR Kuduvalli and RM Rangayyan An

algorithm for direct computation of D linear prediction coecients IEEE

Transactions on Signal Processing

IEEE

q a

q p a

where

is the ACF of the image computed over the set of rows or channels

being used in the multichannel prediction model given by

and

r Ef

m f

m r

In order to minimize the trace of the error covariance matrix

we could

dierentiate both the sides of Equation with respect to the prediction

coecient matrices ar r P

and equate the result to the null

matrix of size P

which leads to

r P

r p a

p r a

p r P

Note

r p

p r

Now Equation may be rewritten as

q p

p a

The relationships derived above may be summarized as

where the matrices are given in expanded form as

In the equation above the submatrices

are as dened in Equation

I is the identity matrix of size P

and is the null matrix

of size P

This system of equations may be referred to as

the multichannel version of the YuleWalker equations The solution to this

set of equations may be used to obtain the D LP coecients by making the

following associations

r p q

compare Equations and

and

r a

where a

ar ar ar P

is composed by the elements of the

row of the matrix of prediction coecients a given in Equation

written as a column matrix

The Levinson Wiggins Robinson algorithm The multichannel pre

diction coecient matrix may be obtained by the application of the algorithms

due to Levinson and Wiggins and Robinson In the multichannel

version of this algorithm the prediction coecients for order P

are recursively related to those for order P

The prediction model given by

Equation is known as the forward predictor Going in the opposite di

rection the backward predictor is dened to predict the vector fm in terms

of the vectors fm p p P

bp fm p

where bp p P

are the multichannel backward prediction coe

cient matrices each of size P

see Figure

In order to derive the multichannel version of Levinson s algorithm let us

rewrite the multichannel ACF matrix in Equations and as follows

where the subscript P

is used to indicate the order of the model and

the subscript c has been dropped from the submatrices for compact notation

The matrix

may be partitioned as

where

and the property that r

r has been used It follows that

and

Let us also dene partitions of the forward and backward prediction coecient

matrices as follows

and

where A

is the same as A in Equations and and B

is formed

in a similar manner for the backward predictor

Using the partitions as dened above we may rewrite the multichannel

YuleWalker equations given by Equation in two forms for forward

and backward prediction as follows

and

where

is the null matrix of size P

The matrix

is the covariance matrix of the error of forward prediction the same as

given by Equation with the matrix

being the counterpart for the

backward predictor It follows that

and

Applying the inversion theorem for partitioned matrices and making

use of the preceding six relationships we get

Multiplying both sides of Equation by

making use of the par

titioned form in Equation and using Equation we get

Extracting the lower P

matrix from both sides of Equa

tion in its partitioned form we have

which upon transposition yields

where

Using Equation in Equation we get

which upon transposition and expansion of the matrix notation yields

p a

p p P

Similarly multiplying both sides of Equation by

and using Equa

tion we get

where

and

p b

p p P

Substituting Equation in Equation and using the partitioned

form of

in Equation we get

where Equation has been used in the last step Similarly we can obtain

an expression for the covariance matrix of the backward prediction error as

Now consider the matrix product

Taking the transpose of the expression above and noting that

is sym

metric we get

Equations and consti

tute the LevinsonWigginsRobinson algorithm with the initialization

With the autocorrelation matrices p dened by the association given in

Equation the matrix

is a ToeplitzblockToeplitz matrix the

block elements submatrices p along the diagonals of

are mutually

identical and furthermore the elements along the diagonals of each subma

trix p are mutually identical Thus

and p are symmetrical

about their cross diagonals that is they are persymmetric A property of

ToeplitzblockToeplitz matrices that is of interest here is dened in terms of

the exchange matrices

of size P

and

of size P

such that JJ I and J

With these denitions we have

JpJ

and

Now premultiplying both sides of Equation by J

and postmultiplying

both sides by J we get

J a

Equation is identical to the modied YuleWalker equations for com

puting the matrices b

p and

in Equation Comparing the terms

in the two equations we get

p J a

p J p P

and

With these simplications the recursive procedures in the multichannel

Levinson algorithm may be modied for the computation of D LP coecients

as follows

p a

J a

p J p P

and

J a

with the initialization

where

p P

Equations constitute the D Levinson algorithm for solving the

D YuleWalker equations for the case of a QP ROS The Levinson algorithm

provides results that are identical to those obtained by direct inversion of

Computation of the D LP coecients directly from the image

data The multichannel version of the Levinson algorithm may be used to

derive the multichannel version of the Burg algorithm as fol

lows Equation may be augmented using the partition shown in Equa

tion as

From Equation the forward prediction error vector for order P

m fm

p fm p

where F

m is the multichannel data matrix of order P

at m

which may be partitioned as

fm P

Similarly the backward prediction error vector may be expressed as

m fm P

p fm P

Transposing both sides of Equation and multiplying by F

using the partitioned forms shown on the righthand side of Equation

as well as using Equations and we get

m e

m a

Similarly the backward prediction error vector is given by

m e

m b

The matrices a

and b

known as the re!ection

coecient matrices that minimize the the sum of the squared forward

and backward prediction errors over the entire multichannel set of data points

N given by

m e

are obtained by solving

where

m e

and

m e

Equations and may be used to compute the

multichannel re!ection coecients directly from the image data without com

puting the ACF

In order to adapt the multichannel version of the Burg algorithm to the

D image case we could force the structure obtained by relating the D

and multichannel ACFs in Equation on to the expressions in Equations

and and redene the error covariance matrices E

and

to span the entire M N image Then the D counterpart of the

re!ection coecient matrix a

is obtained by solving the following

equation

In order to compute the error covariance matrices E

and E

a strip

of width P

is dened so as to span the top P

rows of the image

as shown in Figure and the strip is moved down one row at a time The

region over which the summations are performed includes only those parts of

the strip for which the forward and backward prediction operators do not run

out of data At the beginning of the recursive procedure the error values are

initialized to the actual values of the corresponding pixels Furthermore the

forward and backward prediction error vectors are computed by forcing the

relationship in Equation on to Equation resulting in

m e

m J a

J e

The D Burg algorithm for computing the D LP coecients directly from

the image data may be summarized as follows

The prediction error covariance matrix

is initialized to

The prediction error vectors are computed using Equations and

The prediction error covariance matrices E

and E

are com

puted from the prediction error vectors using Equations

and summing over strips of width P

rows of the image

The re!ection coecient matrix a

is obtained from the

prediction error covariance matrices by solving Equation which

is of the form AX XB C and can be solved by using Kronecker

products

The remaining prediction coecient matrices a

p are computed by

using Equation and the expected value of the prediction error

covariance matrix

is updated using Equation

When the recursive procedure reaches the desired order P

the D LP

coecients are computed by solving Equations and

TABLE

Variables in the D LP Burg and Levinson Algorithms for LP

Variable Size Description

fmn M N D image array

fm P

Multichannel vector at column m

spanning P

rows

r P

Autocorrelation submatrix related

to fm

Extended D autocorrelation matrix

a P

D LP coecient matrix

p P

Multichannelequivalent prediction

coecient matrix

Multichannelequivalent re!ection

coecient matrix

Multichannel prediction error

covariance matrix

mn M N Forward prediction error array

mn M N Backward prediction error array

m P

Multichannel forward prediction

error vector

m P

Multichannel backward prediction

error vector

emn scalar n

element of the vector em

Forward prediction

error covariance matrix

Backward prediction

error covariance matrix

Forwardbackward prediction

error covariance matrix

Bold characters represent vectors or matrices A QP ROS of size P

assumed

The variables involved in the D Burg and Levinson algorithms are summa

rized in Table

The modied multichannel version of the Burg algorithm oers advantages

similar to those of its D counterpart over the direct inversion method it is a

fast and ecient procedure to compute the prediction coecients and predic

tion errors without computing the autocorrelation function The optimization

of the prediction coecients does not make any assumptions about the im

age outside its nite dimensions and hence should result in lower prediction

errors and ecient coding Furthermore the forced D structure makes the

algorithm computationally more ecient than the direct application of the

multichannel Burg procedure

Computation of the prediction error In order to compute the predic

tion error for coding and transmission the trace of the covariance matrix in

Equation may be minimized using Equations and and

eliminating the covariance matric

as follows From Equation

the squared forward and backward prediction error vectors in the D Burg

algorithm are given as

m e

m a

m J a

J e

m J a

J e

J a

J E

J a

J E

J a

The D Burg algorithm for the purpose of image compression consists of

determining the re!ection coecient matrix a

that minimizes the

trace of the error covariance matrix E

This is achieved by dierentiating

Equation with respect to a

and equating the result to the

null matrix which yields

J a

J E

If the D autocorrelation matrices

r are symmetric the matrix a

will also be symmetric which reduces Equation to

When the recursive procedure reaches the desired order P

the multichannel

equivalent D prediction error image is obtained as

p q e

mN qp

p P

q N m

Equation is in the form of the normal equations for D LP This

suggests that the D Burg algorithm for LP may be applied to the

multichannelequivalent D prediction error image to obtain the nal predic

tion error image in a recursive manner as follows

Compute the sum of the squared forward and backward prediction errors

mnj

and

mn c

where c

mn is theMN backward prediction error array initialized

mn e

mn m M n N

Compute the coecient a P

known as the re!ection coecient

a P

Obtain the prediction errors at higher orders as

mn e

mn a P

and

mn a P

mn e

When the desired order P

is reached the prediction errors e

mn are

encoded using a method such as the Human code The re!ection coecient

matrices a

and the D re!ection coecients a P

are also

encoded and transmitted as overhead information

Error free reconstruction of the image from the forward and back

ward prediction errors In order to reconstruct the original image at

the decoder without any error the prediction coecients need to be re

computed from the re!ection coecients The prediction coecients a p

p P

may be computed recursively using the Burg algorithm as

a p a p a q a q p p q q P

The multichannelequivalent D prediction error image is given by

a p e

m p n e

n N m P

The multichannel prediction error vectors are related to the error data dened

above as

mN qp e

p q

p P

q N m

The multichannel signal vectors may be reconstructed from the error vectors

via multichannel prediction as

ap fmpe

m m P

Finally the original image is recovered from the multichannel signal vectors

f mP

p q fmN qp

p P

q N m

with rounding of the results to integers In a practical implementation for

values of e

mn exceeding a preset limit the true image pixel values

would be transmitted and made available directly at the decoder

Results of application to medical images Kuduvalli and Rangayyan

applied the D Levinson and Burg algorithms described above

to the highresolution digitized medical images listed in Table The

average bit rate with lossless compression of the test images using the D

blockwise LP method described in Section the D Levinson algorithm

and the D Burg algorithm were respectively and bpixel

with the original images having bpixel see also Table The multi

channel LP algorithms in particular the D Burg algorithm provided better

compression than the other methods described in the preceding sections in

this chapter The LP models described in this section are related to AR mod

eling for spectral estimation Kuduvalli and Rangayyan found the D

Burg algorithm to provide good D spectral estimates that were comparable

to those provided by other AR models

Adaptive D recursive leastsquares prediction

The LP model with constant prediction coecients given by Equation is

based on an inherent assumption of stationarity of the imagegenerating pro

cess The multichannelbased prediction methods described in Section

are twopass methods where an estimation of the statistical parameters of

the image is performed in the rst pass such as for example the autocor

relation matrix of the image in the D Levinson method and the parame

ters are then used to estimate the prediction coecients in the second pass

Once computed the same prediction coecients are used for prediction over

all of the image data from which the coecients were estimated However

this assumption of stationarity is rarely valid in the case of natural images

as well as biomedical images To overcome this problem in the case of the

multichannelbased methods the approach taken was that of partitioning the

image into blocks and computing the prediction coecients independently

for each block Another possible approach is to adapt the coecients recur

sively to the changing statistical characteristics of the image In this section

the basis for such adaptive algorithms is described and a D recursive least

squares D RLS algorithm for adaptive computation of the LP coecients

is formulated The procedures are based upon adaptive lter theory in

D and in multichannel signal ltering

With reference to the basic D LP model given in Equation several

approaches are available for adaptive computation of the coecients ap q

for each pixel being predicted at the location mn The approach

based on Wiener lter theory see Section leading to the D

LMS algorithm see Section although applicable to image

compression suers from the fact that the estimation of the coecients

ap q does not make use of all the image data available up to the current

location Adaptive estimation of the coecients based upon the Kalman

lter see Section where the prediction

coecients are represented as the state vector describing the current state

of the imagegenerating process has not been explored much However this

approach depends upon the statistics of the image represented in terms of

ensemble averages because only estimates of the ensemble averages can be

obtained this approach is likely to be suboptimal

The approach that is described in this section for adaptive prediction based

upon the work of Kuduvalli is founded upon the method of least squares

This approach is deterministic in its formulation and involves the minimiza

tion of a weighted sum of prediction errors In Section it was observed

that the estimation of the prediction coecients based on the direct mini

mization of the actual prediction errors the D Burg method yielded better

results in image compression than the method based on the estimation of an

ensemble image statistic the D ACF from the image data the D Levin

son method This result suggests that a deterministic approach could also

be appropriate for the adaptive computation of prediction coecients

In D RLS prediction the aim is to minimize a weighted sum of the squared

prediction errors computed up to the present location given by

X X

pqROS

wmn p q ep q

where ep q is the prediction error at p q and wmn p q is a weight

ing factor chosen to selectively forget the errors from the preceding pixel

locations the past in order for the prediction coecients to adapt to the

changing statistical nature of the image at the current location Boutalis et

al used an exponential weighting factor whose magnitude reduces in the

direction opposite to the scanning model used in the generation of the image

With this weightingfactor model and special ROSs Boutalis et al used the

multichannel version of the RLS algorithm directly for adaptive estimation of

images however their weightingfactor model does not take into account the

D nature of images the weight assigned to the error at a location adjacent

in the row direction to the current location is higher than the weight assigned

to the error at a location adjacent in the column direction Kuduvalli

proposed a weightingfactor model that is truly D in its formulation In this

method using a rectangular region spanning the image up to the current loca

tion for minimizing the sum of the prediction errors as shown in Figure

and an exponential weighting factor dened as wmn p q

mpnq

where is a forgetting factor the weighted squared error is dened

mpnq

ep q

Let us consider a QP ROS of order P Q for prediction as shown in

Figure and use the following notation for representing the prediction

FIGURE

ROSs in adaptive LP by the D RLS method While the image is scanned

from the position mn to mn a column of m pixels becomes available

as new information that may be used to update the forward and backward

predictors Observe that a part of the column of new information is hidden

by the ROS for both forward and backward prediction in the gure

coecients and the image data spanning the current ROS as vectors

amn

mn a

where

mn amnp amnp amnpQ

with amn and

n f

with

n fm p n fm p n fm p nQ

Here the subscripts P and P represent the order size of the matrices

and vectors and the indices mn indicate that the values of the parameters

corresponding to the pixel location mn Observe that amn is a P

Q matrix with amnp q representing its element at p q With this

notation the prediction error may be written as

emn a

mn F

The D RLS normal equations The coecients that minimize the

weighted sum of the squared prediction errors

mn given in Equation

are obtained as the solution to the D RLS normal equations which

are obtained as follows Let us perform partitioning of the matrices

amn and F

mn as

amn

and

fmn

fm P nQ

Observe that the coecient matrix

amn and the data matrix

consist of all of the D RLS coecients amnp q and all of the image

pixels fm p n q such that p q QP ROS for a forward predictor

With partitioning as above the prediction error in Equation may be

written as

emn fmn

The sum of the squared prediction errors in Equation may now be

expressed as

mpnq

ep q

mpnq

fp q

p q

fp q

p q

mpnq

p q

p q fp q

p q

amn

In order to determine the coecients amnp q that minimize

we could dierentiate the expression above for

mn with respect to the

coecient matrix

amn and equate the result to the null matrix of size

P Q which yields

amn

mpnq

p q fp q

p q

amn

Equation may be expressed in matrix notation as

mpnq

p q

fp q

p q

amn

In addition to the above using Equation in Equation we have

mpnq

p q fp q

p q

amn

which may be written in matrix form as

mpnq

fp q

p q

amn

Combining Equations and we get

mpnq

fp q

p q

fp q

p q

amn

mpnq

p q F

p q amn

which may be expressed as

mn amn mn

where

and

mn is the deterministic autocorrelation matrix of the weighted

image given by

mpnq

p q F

p q

Equation represents the D RLS normal equations solving which we

can obtain the prediction coecients amnp q that adapt to the statistics

of the image at the location mn

Solving the D RLS normal equations Direct inversion of the auto

correlation matrix in Equation gives the desired matrix of prediction

coecients as

amn

mn mn

The matrix

mn is of size PQPQ the inversion

of such a matrix at every pixel mn of the image could be computationally

intensive Kuduvalli developed the following procedure to reduce the size

of the matrix to be inverted to Q Q The procedure starts with a

recursive relationship expressing the solution for the normal equations at the

pixel location mn in terms of that at m n Consider the expression

mpnq

p q F

p q

where

mpnq

q f

Observe that

m r s

which follows from the assumption that the image data have been windowed

such that fmn for m or n

The normal equations may now be expressed as

where

is the null matrix of size Q

Equation may be solved in two steps First solve

for the Q Q matrices A

mn p P and F

Here I

is the identity matrix of size QQ and

is the null

matrix of size Q Q Then obtain the solution to Equation

by solving

mn a

mn mn

and using the relationship

mn a

mn p P

This approach is similar to the approach taken to solve the D YuleWalker

equations by the D Levinson method described in Section and leads

to a recursive algorithm that is computationally ecient the details of the

algorithm are given by Kuduvalli

Results of application to medical images Kuduvalli conducted

preliminary studies on the application of the D RLS algorithm to predictive

coding and compression of medical images In the application to coding the

value of the pixel at the current location mn is not available at the decoder

before the prediction coecient matrix amn is computed However the

prediction coecient matrix am n is available Thus for errorfree de

coding the a priori prediction error computed using the prediction coecient

matrix am n is encoded These error values which have a PDF that

is close to a Laplacian PDF may be eciently encoded using methods such

as the Human code Using a QP ROS of size and a forgetting factor

of Kuduvalli obtained an average bit rate of bpixel for

two of the images listed in Table this rate however is only marginally

lower than the bit rate of bpixel for the same two images obtained by

using the D Burg algorithm described in Section Although the D

RLS algorithm has the elegance of being a truly D algorithm that adapts

to the changing statistics of the image on a pixelbypixel basis the method

did not yield appreciable advantages in image data compression Regardless

the method has applications in other areas such as spectrum estimation and

ltering

Kuduvalli and Rangayyan performed a comparative analysis of several

image compression techniques including direct source coding transform cod

ing interpolative coding and predictive coding applied to the highresolution

digitized medical images listed in Table The average bit rates obtained

using several coding and compression techniques are listed in Table It

should be observed that decorrelation can provide signicant advantages over

direct source encoding of the original pixel data The adaptive predictive

coding techniques have performed better than the transform and interpola

tive coding techniques tested

In a study of the eect of sampling resolution on image data compression

Kuduvalli and Rangayyan prepared lowresolution versions of the images

listed in Table by smoothing and downsampling The results of the

application of the D Levinson predictive coding algorithm yielded average

bit rates of bpixel for images and bpixel for

images with the original images at bpixel This result

indicates that highresolution images possess more redundancy and hence

may be compressed by larger extents than their lowresolution counterparts

Therefore increasing the resolution of medical images does not increase the

amount of the related compressed data in direct proportion to the increase

in matrix size but by a lower factor This result could be a motivating

factor supporting the use of high resolution in medical imaging without undue

concerns related to signicant increases in datahandling requirements

See Aiazzi et al for a description of other methods for adaptive pre

diction and a comparative analysis of several methods for lossless image data

compression

Image Scanning Using the PeanoHilbert Curve

Peano scanning is a method of scanning an image by following the path de

scribed by a spacelling curve

Giuseppe Peano an Italian mathematician described the rst spacelling

curve in an attempt to map a line into a D space The term Peano

scanning is used to refer to such a scanning scheme irrespective of the space

lling curve used to dene the scan path Peano s curve was modied by

TABLE

Average Bit Rates Obtained in

the Lossless Compression of

the Medical Images Listed in

Table Using Several Image

Coding Techniques

Coding method Bits pixel

Original

Entropy H

Human

Arithmetic

LZW

DCT

Interpolative

D LP

D Levinson

D Burg

D RLS#

The Human code was used to encode the results of the transform interpola

tive and predictive coding methods #Only two images were compressed with

the D RLS method

Hilbert and the modied curve came to be known as the PeanoHilbert

curve

Moore studied the geometric and analytical interpretation of contin

uous spacelling curves Spacelling curves have aided in the development of

fractals see Section for a discussion on fractals The PeanoHilbert

curve has been applied to display continuoustone images in order to

eliminate deciencies of the ordered dithered technique such as Moir$e fringes

Lempel and Ziv used the PeanoHilbert curve to scan images and dene

the lowest bound of compressibility Zhang et al explored the

statistical characteristics of medical images using Peano scanning

Provine and Rangayyan studied the application of Peano scan

ning for image data compression with an additional step of decorrelation

using dierentiation orthogonal transforms or LP the following paragraphs

describe the basics of the methods involved and the results obtained in their

work

Denition of the Peanoscan path

If a physical scanner that can scan an image by following the Peano curve

is not available Peano scanning may be simulated by selecting pixels from

a rasterscanned image by traversing the D data along the path described

by the PeanoHilbert curve The reordered pixel data so obtained in a D

stream may be subjected to decorrelation and encoding operations as desired

An inverse Peanoscanning operation would be required at the receiving end

to reconstruct the original image A general image compression scheme as

above is summarized in Figure

FIGURE

Image compression using Peano scanning

The Peanoscan operation is recursive in nature and spans a D space

encountering a total of

points where i is an integer and i From the

perspective of processing a D array containing the pixel values of an image

this would require that the dimensions of the array be an integral power of

In the scanning procedure the given image of size

is divided into four

quadrants each of them forming a subimage see Figure Each of the

subimages is further divided into four quadrants and the procedure continues

The original image is divided into a total of T

subimages each

of size

where i n In the following discussion the

subimages formed by the recursive subdivision procedure as above will be

referred to as s

k k T

where k increases along the direction of

the scan path the entire image will be referred to as s

Thus each of the four

quadrants formed by partitioning a subimage s

for any i is of size

The division of a given image into subimages is shown in Figure the

recursive division of subimages is performed until the s

subimages are formed

The four pixels within the smallest subimage are denoted as p p p

and p respectively in the order of being scanned As the scan path builds

recursively the path denition is based on the basic denitions for a

subimage as well as the recursive denitions for subimages of larger size until

the entire image is scanned The four basic denitions of the Peanoscanning

operation are given in Figure

The recursive denitions of the Peanoscanning operation are given in Fig

ure which inherently use the basic denitions shown in Figure

to obtain further pixels from the subimages The denitions go down recur

sively from i n to i that is from the image s

down to the subimages

the basic denitions are used to obtain the pixels from the s

subimages

The scanpath denition for an image or a subimage depends on i At a

higher level that is for an s

i the recursive denition is as follows

If i is odd the recursive denition is R see Figure

If i is even the recursive denition is D

From the recursive denitions shown in Figure the denitions of the

scan pattern in each of the subimages is obtained as follows

If s

k has the recursive denition R s

k will follow the path

given by D s

k and s

k the path R and s

the path U

If s

k has the recursive denition D s

k will follow the path

given by R s

k and s

k the path D and s

the path L

If s

k has the recursive denition L s

k will follow the path

given by U s

k and s

k the path L and s

the path D

If s

k has the recursive denition U s

k will follow the path

given by L s

k and s

k the path U and s

the path R

FIGURE

Division of a image into subimages during Peano scanning Repro

duced with permission from JA Provine and RM Rangayyan Lossless

compression of Peanoscanned images Journal of Electronic Imaging

SPIE and IS%T

FIGURE

Basic denitions of the Peanoscanning operation The points marked pp

represent the four pixels in a subimage Each scan pattern shown visits

four pixels in the order indicated by the arrows R right L left D down

U up Reproduced with permission from JA Provine and RM Rangayyan

Lossless compression of Peanoscanned images Journal of Electronic Imag

ing

SPIE and IS%T

FIGURE

Recursive denitions of the Peanoscanning operation Reproduced with per

mission from JA Provine and RM Rangayyan Lossless compression of

Peanoscanned images Journal of Electronic Imaging

SPIE and IS%T

The index k can take any value in the range T

for i n

From the recursive denitions it is evident that k cannot continuously increase

horizontally or vertically due to the nature of the scan Furthermore because

the scan path takes its course recursively except for i n the recursive

denitions L and U see Figure are also possible for lower values of

i All the subimages are divided and dened recursively until all the subimages

are dened The basic denitions of the Peano scan are then followed for

each of the s

subimages The Peanoscan pattern for a image is

illustrated in Figure where the heavy dots indicate the positions of

the rst pixels scanned Understanding the scan pattern is facilitated by

viewing Figure along with Figure

FIGURE

Peanoscan pattern for a image The positions of the pixels on the scan

pattern are shown by heavy dots in the rst subimage Reproduced with

permission from JA Provine and RM Rangayyan Lossless compression of

Peanoscanned images Journal of Electronic Imaging

SPIE and IS%T

Implementation of Peano scanning in software could use the recursive nature

of the scan path eciently to obtain the pixel stream Recursive functions

may call themselves within their body as the image is divided progressively

into subimages until the s

subimages are formed and the pixels are obtained

recursively as the function builds back from the s

subimages to the full image

Thus the D image is unwrapped into a D data stream by following a

continuous scan path

The inverse Peanoscanning operation accomplishes the task of lling up the

D array with the D data stream This operation corresponds to the original

works of Peano and Hilbert where the continuous mapping of a

straight line into a D plane was described Because the Peanoscanning

operation is reversible no loss of information is incurred

Properties of the PeanoHilbert curve

The PeanoHilbert curve has several interesting and useful properties The

curve is continuous but not dierentiable it does not have a tangent at any

point Moore gave an explanation of the PeanoHilbert curve adhering

to this property This property motivated the development of several other

curves with the same property which are used in the domain of fractals

The PeanoHilbert curve lls the D space continuously without passing

through any point more than once This feature enables the mapping of a D

array into a D data stream The recursive nature of the curve is useful in

ecient implementation of the path of the curve These two properties aid in

scanning an image recursively quadrant by quadrant leaving each quadrant

only after having obtained every pixel within that quadrant with each pixel

visited only once in the process see Figures and Preservation of

the local D context in the scanning path could be expected to increase the

correlation between successive elements in the D data stream This aspect

could facilitate improved image data compression

Two other aspects of the PeanoHilbert curve have proven to be useful

in the bilevel display of continuoustone images Linearizing a

D array along the path described by the PeanoHilbert curve reduces the

error between the sum of the bilevel values and the sum of the continuous

tone values of the original image because D locality is maintained by the scan

path unlike the D vector formed by concatenating the horizontal rasterscan

lines of the image The problem of long sections of scan lines running adjacent

to one another is eliminated by following the Peanoscan path instead of the

raster scan Thus Moir$e patterns can be eliminated in regions of uniform

intensity when presenting graylevel images on a bilevel display

Implementation of Peano scanning

A practical problem that could arise in implementing the Peanoscanning op

eration on large images is the diculty in allocating memory for the long linear

array used within the body of the recursive function for storing the scanned

data Provine suggested the following approach to address this problem

by using a symmetrical pattern exhibited by the PeanoHilbert curve

The PeanoHilbert curve exhibits a symmetrical pattern which may be

described as follows For any subimage s

k the scan paths for the subimages

k and s

k are the mirror re!ections of the paths for the

subimages s

k and s

k respectively Two types of symmetry

exist in the Peanoscan paths for a

subimage depending on whether i

is odd or even If i is odd the pattern for the upper half of the D space is

re!ected in the lower half if i is even the pattern in the lefthand half of the

D space is re!ected in the righthand half see Figure

A symmetrical scan pattern exists for any subimage formed by the recursive

division process Hence for the smallest subimage the symmetrical pattern

suggests that the basic denitions eectively obtain only two pixels one after

the other either horizontally or vertically In other words the basic scan

pattern eectively obtains only two pixels p and p out of the four pixels

in a subimage see Figure The manner in which the remaining

two pixels p and p are obtained follows the symmetry property stated above

substituting i and k The sequence in which the two pixels p and

p are obtained is shown in Figure in dashed lines

In scanning large images for any subimage s

k the pattern of the Peano

scan path from the rst pixel of s

k to the last pixel of s

k is

the same as that from the last pixel of s

k to the rst pixel of s

Because the Peanoscan path does not leave any quadrant without visiting all

the pixels within the quadrant two equal sections of the image can be un

wrapped independently without aecting each other into individual linear

arrays by following the same scan path but in opposite directions The re

sulting linear arrays when concatenated appropriately give the required D

data stream For the case illustrated in Figure a the pixels can be

obtained as a linear sequence by tracing the Peanoscan path on pixels

in the upper half followed by the pixels in the lower half Thus several

D arrays of a reasonable size may be used to hold the pixels obtained from

dierent sections of the image After all the subimages have been scanned in

parallel if desired the arrays may be concatenated appropriately to form

the long sequence containing the entire image data

Decorrelation of Peanoscanned data

The Peanoscanning operation scans the given picture recursively quadrant by

quadrant Therefore we could expect the D local statistics to be preserved

in the resulting D data stream Furthermore we could also expect a higher

correlation between pixels for larger lags in the Peanoscanned data than

between the pixels obtained by concatenating the rasterscanned lines into a

D array of pixels

FIGURE

Symmetrical patterns exhibited by the PeanoHilbert curve for a an

subimage and b a subimage The line AB indicates the axis of

symmetry in each case The pixels in the image in a are labeled in the

order of being scanned the pixels in the upper half followed by the

pixels in the lower half of the subimage in a Figure courtesy of JA

Provine

FIGURE

Symmetrical patterns in the basic denitions of the Peano scan Figure cour

tesy of JA Provine

Figure shows the ACFs obtained for the Lenna test image and a mam

mogram for rasterscanned and Peanoscanned pixel streams As expected

Peano scanning has maintained higher interpixel correlation than raster scan

ning Similar observations have been made by Zhang et al in their

study on the stochastic properties of medical images and data compression

with Peano scanning

The simplest method for decorrelating pixel data is to produce a data se

quence containing the dierences between successive pixels As shown earlier

in Section and Figure the dierentiated data may be expected to

have a Laplacian PDF which is useful in compressing the data Provine and

Rangayyan applied a simple rstdierence operation to decorre

late Peanoscanned image data and encoded the resulting values using the

Human arithmetic and LZW coding schemes In addition they applied the

D DCT and LP modeling procedures to rasterscanned and Peanoscanned

data streams as well as the D DCT and LP modeling procedures to the

original image data Some of the results obtained by Provine and Rangayyan

are summarized in Tables and The application of either

the Human or the arithmetic code to the dierentiated Peanoscanned data

stream resulted in the lowest average bit rate in the study

Image Coding and Compression Standards

Two highly recognized international standards for the compression of still

images are the Joint Bilevel Image experts Group JBIG standard

and the Joint Photographic Experts Group JPEG stan

dard JBIG and JPEG are sanctioned by the International Orga

nization for Standardization ISO and the Comit$e Consultatif International

T$el$ephonique et T$el$egraphique CCITT Although JBIG was initially pro

posed for bilevel image compression it may also be applied to continuous

FIGURE

ACF of rasterscanned and Peanoscanned pixels plotted as a function of the

distance lag between the scanned pixels a for the Lenna image and b for

a mammogram Figure courtesy of JA Provine

TABLE

Average Bit Rate with the Application of the Human Arithmetic and

LZW Coding Schemes to Rasterscanned and Peanoscanned Data

Obtained from Eight Test Images

Entropy Average number of bitspixel

Image Size H

LZW

bpixel pixels bits Human Arith Raster Peano

Airplane

Baboon

Cameraman

Lenna

Peppers

Sailboat

Tiany

Mean

See also Tables and Note Arith arithmetic coding

tone images by treating bit planes as independent bilevel images Note

The term continuoustone images is used to represent graylevel images

color images and multicomponent images whereas some authors use the

term mary for the same purpose the former is preferred as it is used

by JPEG The eciency of such an application depends upon preprocess

ing for bitplane decorrelation The Moving Picture Experts Group MPEG

standard applies to the compression of video images In the

context of medical image data handling and PACS the ACR and the US

National Electrical Manufacturers Association NEMA proposed standards

known as the ACR NEMA and DICOM Digital Imaging and Communica

tions in Medicine standards The following sections

provide brief reviews of the standards mentioned above

TABLE

Average Bit Rate with Dierentiated Peanoscanned Data PD

Compared with the Results of D and D DPCM Encoding of

Rasterscanned Data from Eight Test Images

Average number of bitspixel

Human Arithmetic LZW

DPCM DPCM DPCM

Image PD D D PD D D PD D D

Airplane

Baboon

Cameraman

Lenna

Peppers

Sailboat

Tiany

Mean