ABSTRACT

ABSTRACT e purpose of this project is to present a set of algorithms and their eciency for Multiple Sequence Alignment (MSA) and clustering problems, including also solutions in distributive environments with Hadoop. e strength, the adaptability, and the eectiveness of the genetic algorithms (GAs) for both problems are pointed out. MSA is among the most important tasks in computational biology. In biological sequence comparison, emphasis is given to the simultaneous alignment of several sequences. GAs are stochastic approaches for ecient and robust search that can play

CONTENTS 4.1 Introduction 72 4.2 CDM 74 4.3 PEA 76 4.4 Divide and Conquer 79 4.5 GAs 79 4.6 DCGA 80 4.7 K-Means 81 4.8 Clustering Genetic Algorithm with the SSE Criterion 81 4.9 MapReduce Section 83 4.10 Simulation 83 4.11 Conclusion 87 References 87

a signicant role for MSA and clustering. e divide-and-conquer principle ensures undisturbed consistency during vertical sequences’ segmentations.