ABSTRACT

Illumina sequencers perform image analysis and base calling, producing binary base call (*.bcl) files as their primary output. Newer Illumina models such as the HiSeq produce compressed BCL files (*.bcl.gz). FASTQ is a file format for sharing sequencing read data combining both the sequence and an associated per base quality score. The FASTA format is one of the most ubiquitous of all formats in bioinformatics. The FASTQ format is an extension of the FASTA format that additionally stores a numeric quality score associated with each nucleotide in a sequence. To store the Phred scores as characters, the quality scores are converted to ASCII characters. ASCII (American Standard Code for Information Interchange), is an early character encoding standard for representing characters in computers and other devices that was first published in 1963. Illumina uses a standard naming scheme for the FASTQ files. It is useful to understand how the scheme is structured.