ABSTRACT

It was recognized early on that the newly created protein and DNA sequences needed to be stored in a public repository, similar to books and other written works in libraries; therefore GenBank, the European Molecular Biology Laboratory (EMBL) Data library, and Japan International Protein Information Database (JIPID) were founded (Benson et al., 1993; Kneale and Kennard, 1984). As the Internet really did not yet exist as a commodity in these early days, the data were initially distributed on tapes and £oppy disks at a cost. Smith and Waterman created the original exhaustive search algorithm for sequence data in 1984 (Lipman et al., 1984). It turned out that with the computer infrastructure of the time, this algorithm took too long for it to be useful; therefore the ¥rst “workable” tools to emerge in bioinformatics were the heuristic database search algorithms FASTA (Pearson and Lipman, 1988) and soon therea©er BLAST (Altschul et al., 1990). ™ese could be used to screen a new sequence against the sequences in the databanks. Specialized search algorithms emerged, which were capable of identifying motifs within protein sequences (such as Prosearch) (Kolakowski et al., 1992) or ¥nding restriction sites within DNA sequences (such as REBASE) (Roberts and Macelis, 1993). Tools for the analysis of biophysical parameters of protein sequences were created. For instance, it became very popular to publish hydrophobicity analyses (Bigelow, 1967) for newly sequenced proteins in the scienti¥c literature. Hydrophobicity plots can also be considered one of the ¥rst graphical visualizations used in bioinformatics.