Protein Sequence Analysis

doi:10.1201/9781584888116-16

ABSTRACT

SECTION 1 PRIMARY STRUCTURE ANALYSIS 334 Part I Introduction 334

1. What Is Primary Structure Analysis? 334 2. What Is Involved in Primary Structure Analysis? 335

Part II Step-By-Step Tutorial 337 1. Selecting Protein Sequences 337 2. Compute pI/Mw 337 3. RADAR (Rapid Automatic Detection and Alignment of Repeats in Protein Sequences) 339 4. PlotScale 340 5. Hydrophobic Cluster Analysis (HCA) of Proteins 342

Part III Sample Data 344 1. Human Vaspin Amino Acid Sequence in the FASTA Format 344 2. Mouse Galectin-9 Amino Acid Sequence in the FASTA Format 344 3. Human Collectrin Amino Acid Sequence in the FASTA Format 344 4. Human α1-Antitrypsin Amino Acid Sequence in the FASTA Format 345

SECTION 2 SECONDARY AND TERTIARY STRUCTURE ANALYSIS 345

Part I Introduction 345 1. What Is Secondary and Tertiary Structure Analysis? 345 2. What Is Involved in Secondary and Tertiary Structure Analysis? 346

Part II Step-By-Step Tutorial 352 1. Analyze the Sequence with the RCSB PDB 352 2. Prediction of Secondary and Tertiary Structure by Using PredictProtein Metaserver 354 3. Prediction of 3D Structure by the Metaserver in BioInfoBank 357

Part III Sample Data 360 1. Amino Acid Sequence of Rat Betacellulin (BTC) in the FASTA Format Fetched from Genbank 360 2. A Partial Sequence Corresponding to EGF-Like Domain of Rat BTC 360 3. A Partial Sequence Corresponding to Cytoplasmic Domain of Rat BTC 360

SECTION 3 PATTERN AND PROFILE SEARCH 360 Part I Introduction 360

1. Why are Pattern and Profile Search Needed? 360 2. What Is Pattern and Profile Search? 361 3. Integrated Analyzing System Developed for the Databases 365

Part II Step-By-Step Tutorial 368 Part III Sample Data 376

1. Human Betacellulin Precursor Amino Acid Sequence in FASTA Format 376 2. The Amino-Terminal Half Amino Acid Sequence of Human Betacellulin 376 3. The Carboxyl-Terminal Half Amino Acid Sequence of Human Betacellulin 376

SECTION 1 PRIMARY STRUCTURE ANALYSIS Part I Introduction 1. What Is Primary Structure Analysis? Most natural polypeptides contain between 50 and 2000 amino acid residues and are commonly referred as proteins. The mean molecular weight of an amino acid residue is about 110, and so the molecular weights of most proteins are between 5,500 and 220,000. Each protein has a unique,

precisely defined amino acid sequence, and it is often referred as its primary structure. Analyzing the amino acid sequences using a primary structure analysis program is the initial step to predict the functions and three-dimensional structures of proteins. A classical method to determine amino acid sequence is Edman degradation, in which amino acid residues are removed stepwise from the N-terminus by reaction with phenylisothiocyanate. This method was named after Pehr Victor Edman (1916-1977), a Swedish protein chemist, who described the method in 1956. Frederick Sanger, an English biochemist and a two-time Nobel laureate, determined the complete amino acid sequence of insulin in 1955, which earned him his first Nobel Prize in Chemistry in 1958. This section will not describe direct amino acid sequencing but will cover the bioinformatic analysis of primary structure, including (1) computation of theoretical pI (isoelectric point) and Mw (molecular weight) of proteins, (2) de novo repeat detection in protein sequences, (3) hydropathy plot for proteins, and (4) hydrophobic cluster analysis of proteins.