ABSTRACT

SECTION 1 PRIMARY STRUCTURE ANALYSIS 334 Part I Introduction  334

1. What Is Primary Structure Analysis? 334 2. What Is Involved in Primary Structure Analysis? 335

Part II Step-By-Step Tutorial  337 1. Selecting Protein Sequences 337 2. Compute pI/Mw 337 3. RADAR (Rapid Automatic Detection and Alignment of Repeats in Protein Sequences) 339 4. PlotScale 340 5. Hydrophobic Cluster Analysis (HCA) of Proteins 342

Part III Sample Data  344 1. Human Vaspin Amino Acid Sequence in the FASTA Format 344 2. Mouse Galectin-9 Amino Acid Sequence in the FASTA Format 344 3. Human Collectrin Amino Acid Sequence in the FASTA Format 344 4. Human α1-Antitrypsin Amino Acid Sequence in the FASTA Format 345

SECTION 2 SECONDARY AND TERTIARY STRUCTURE ANALYSIS 345

Part I Introduction  345 1. What Is Secondary and Tertiary Structure Analysis? 345 2. What Is Involved in Secondary and Tertiary Structure Analysis? 346

Part II Step-By-Step Tutorial  352 1. Analyze the Sequence with the RCSB PDB 352 2. Prediction of Secondary and Tertiary Structure by Using PredictProtein Metaserver 354 3. Prediction of 3D Structure by the Metaserver in BioInfoBank 357

Part III Sample Data  360 1. Amino Acid Sequence of Rat Betacellulin (BTC) in the FASTA Format Fetched from Genbank 360 2. A Partial Sequence Corresponding to EGF-Like Domain of Rat BTC 360 3. A Partial Sequence Corresponding to Cytoplasmic Domain of Rat BTC 360

SECTION 3 PATTERN AND PROFILE SEARCH 360 Part I Introduction  360

1. Why are Pattern and Profile Search Needed? 360 2. What Is Pattern and Profile Search? 361 3. Integrated Analyzing System Developed for the Databases 365

Part II Step-By-Step Tutorial  368 Part III Sample Data  376

1. Human Betacellulin Precursor Amino Acid Sequence in FASTA Format 376 2. The Amino-Terminal Half Amino Acid Sequence of Human Betacellulin 376 3. The Carboxyl-Terminal Half Amino Acid Sequence of Human Betacellulin 376

SECTION 1 PRIMARY STRUCTURE ANALYSIS Part I Introduction 1.  What Is Primary Structure Analysis? Most natural polypeptides contain between 50 and 2000 amino acid residues and are commonly referred as proteins. The mean molecular weight  of  an amino acid  residue  is  about 110,  and  so  the molecular weights of  most proteins are between 5,500 and 220,000. Each protein has a unique, 

precisely defined amino acid sequence, and it is often referred as its primary  structure.  Analyzing  the  amino  acid  sequences  using  a  primary  structure analysis program is the initial step to predict the functions and  three-dimensional structures of proteins. A classical method to determine  amino acid sequence is Edman degradation, in which amino acid residues  are removed stepwise from the N-terminus by reaction with phenylisothiocyanate. This method was named after Pehr Victor Edman (1916-1977), a  Swedish protein chemist, who described  the method  in 1956. Frederick  Sanger, an English biochemist and a two-time Nobel laureate, determined  the complete amino acid sequence of insulin in 1955, which earned him  his first Nobel Prize in Chemistry in 1958. This section will not describe  direct amino acid sequencing but will cover the bioinformatic analysis of  primary structure, including (1) computation of theoretical pI (isoelectric  point) and Mw (molecular weight) of proteins, (2) de novo repeat detection  in protein sequences, (3) hydropathy plot for proteins, and (4) hydrophobic cluster analysis of proteins.