The Analysis of Molecular Sequences in Large Data Sets: Where Should We Put Our Effort?
The problems of nucleotide homology determination and tree search are intertwined and complex issues for phylogenetic reconstruction. Both present NP-hard optimisations. One step and two step heuristic procedures are reviewed and compared through the analysis of example data sets using multiple sequence alignment plus tree search and direct optimisation techniques. The examples here show that extraordinary effort on the tree search side cannot overcome the shortcomings of poor sequence homology heuristics. Direct optimisation using the most simple heuristics can offer solutions with 30% better optimality scores in larger data sets.