ABSTRACT

In recent years, biologists are generating data on a massive scale due to technologies such as microarray and high-throughput sequencing. This biological big data implies that a large number of hypotheses are tested simultaneously, even on the scale of hundreds of thousands or millions. Consequently, the multiple comparisons problem has become of great importance, i.e., the more simultaneous tests are performed, the more likely it is that false positives will occur. There are many approaches for correction of the multiple testing problem. This review first informally outlines the principal multiple testing correction strategies, their utility under different research scenarios and the available free software. A map is provided for deciding the best correction strategy depending on the research interest, either exploratory or confirmatory, the assumptions that can be made, the availability of prior information and the kind of statistical test. Finally, it goes on to more formal and technical description, including the programming algorithms for the different methods described and a brief sketch of new approaches.