ABSTRACT

Working with character data is fundamental to many tasks in computational biology, but is not that common of a problem in statistical applications. The tools that are available in R are more oriented to the processing of data into a form that is suitable for statistical analysis, or to format outputs for publication. There is an increased awareness, and corresponding capabilities, for dealing with different languages and file encodings, but we will not do more than briefly touch on this subject. In this chapter we review the builtin capabilities in R, but then turn our attention to some problems that are more fundamental to biological applications.