ABSTRACT

GILLES BOUTET, SUSETE ALVES CARVALHO, MATTHIEU FALQUE, PIERRE PETERLONGO, EMELINE LHUILLIER, OLIVIER BOUCHEZ, CLÉMENT LAVAUD, MARIE-LAURE PILETNAYEL, NATHALIE RIVIÈRE, and ALAIN BARANGER

7.1 INTRODUCTION

SNPs (Single Nucleotide Polymorphisms) are genetic markers of choice for both linkage and association mapping and for population structure and evolution analysis. They are virtually unlimited, evenly distributed along the genome, bi-allelic and co-dominant. Massive SNP discovery was first limited to the few species with an available reference genome. Recently, with the advances in cheaper next generation sequencing (NGS) technologies, various accessions within species even with complex genomes could be sequenced [1]. The challenge of sequencing large genomes with high levels of repeated sequences first led to the development of novel approaches for reducing genome complexity [2]. cDNA sequencing, which specifically addresses the expressed genic fraction, was largely developed and reviewed in Duarte et al. [3]. Restriction site Associated DNA (RAD)

tags have been applied to a large range of organisms such as Drosophila melanogaster [4], fish and fungi [5]. In plants, RAD-Seq has been applied to a number of species for both large-scale SNP discovery and the mapping of SNP subsets in barley [6] and rye-grass [7]. In legume species, Deokar et al. [8] first reported the use of RAD-Seq in chickpea to discover 29,000 SNPs and subsequently map 604 recombination bins. Restriction enzyme digest to reduce genome complexity followed by direct Genotyping-bySequencing was reported for maize RILs and barley doubled haploid lines [9], where 2,382 markers were eventually mapped on the barley genetic map. In legume species, Sonah et al. [10] first used GBS in soybean to develop 10,120 high quality SNPs. Thus all these studies used genome reduction and various assembling tools.