ABSTRACT

Genomic selection (GS) is a tool in plant and animal breeding that utilizes machine learning approaches to make predictions of un-phenotyped individuals to make selection decisions. Constructing genomic prediction models requires genome-wide marker data along with phenotypic data to build a training population set (TRS). The selection of the TRS is critical for the success of GS since the predictions are based on markers or individual effects estimated on the TRS. Here, we review the different criteria proposed in the literature when designing a TRS. In addition, we provide a practical overview of the statistical analysis needed to optimize the TRS using R. The statistical procedure is performed by the R package TrainSel, and issues associated with the analysis are addressed along with the R code. The ultimate aim of this chapter is to provide a practical guideline to perform TRS optimization analysis using R, rather than describe the theory in depth.