ABSTRACT

DNA is an extremely long chain of molecules that contains all

the information necessary for the life functions of a cell. Through

representing DNA as a character string, computational methods can

be used to study DNA sequences. In this chapter, we have discussed

the problem of identifying protein-coding regions called exons in a

DNA sequence. Various computational measures have been used to

study the bias distribution of base compositions in exons, such as

position asymmetry and three-periodicity characterization through

discrete Fourier transform. These statistical features can then be

fed into machine learning methods such as k-nearest neighbor and neural networks to obtain reliable identification.