ABSTRACT
DNA is an extremely long chain of molecules that contains all
the information necessary for the life functions of a cell. Through
representing DNA as a character string, computational methods can
be used to study DNA sequences. In this chapter, we have discussed
the problem of identifying protein-coding regions called exons in a
DNA sequence. Various computational measures have been used to
study the bias distribution of base compositions in exons, such as
position asymmetry and three-periodicity characterization through
discrete Fourier transform. These statistical features can then be
fed into machine learning methods such as k-nearest neighbor and neural networks to obtain reliable identification.