ABSTRACT

The large-scale datasets generated by gene sequencing, proteomics, and other highthroughput experimental technologies are the bases for understanding life as a molecular system and for developing medical, industrial, and other practical applications. In order to facilitate bioinformatics analysis of such large-scale datasets, it is essential to organize our knowledge on higher levels of systemic functions in

9.1 Introduction .................................................................................................. 147 9.2 Overview of KEGG ...................................................................................... 148

9.2.1 Knowledge Representation ............................................................... 148 9.2.2 Mapping Procedures ......................................................................... 149 9.2.3 KEGG Orthology System ................................................................. 150 9.2.4 Disease and Drug Information Resources ........................................ 151 9.2.5 Neurodegenerative Diseases in KEGG ............................................ 152

9.3 Network Analysis of Neurodegenerative Diseases ....................................... 155 9.3.1 Protein-Protein Interaction Dataset from Literature ....................... 155 9.3.2 Common Proteins Linking Disease and Normal Pathways ............. 156 9.3.3 Large-Scale Protein-Protein Interaction Dataset ............................. 157 9.3.4 Extended Protein Interaction Network ............................................. 158 9.3.5 Protein Domain Analysis .................................................................. 159 9.3.6 Domain-Based Similarity of Neurodegenerative Diseases .............. 160

9.4 Concluding Remarks .................................................................................... 161 References .............................................................................................................. 161

a computable form, so that it can be used as a reference for inferring molecular systems from the information contained in the building blocks. Thus, we have been developing the KEGG (Kyoto Encyclopedia of Genes and Genomes) database (https:// www.genome.jp/kegg/), an integrated resource of about 20 databases (1). The main component is the KEGG PATHWAY database, consisting of manually drawn graphical diagrams of molecular networks, called pathway maps, and representing various cellular processes and organism behaviors. KEGG PATHWAY is a reference database for pathway mapping, which is the process to match, for example, a genomic or transcriptomic content of genes against KEGG reference pathway maps to infer systemic functions of the cell or the organism.