chapter  8
40 Pages

Machine Learning in Structural Biology: Interpreting 3D Protein Images

This chapter discusses an important problem that arises in structural biology: given an electron density map – a three-dimensional “image” of a protein produced from crystallography – identify the chain of protein atoms contained within the image. Traditionally, a human performs this interpretation, perhaps aided by a graphics terminal. However, over the past 15 years, a number of research groups have used machine learning to automate density map interpretation. Early methods had much success, saving thousands of crystallographer-hours, but required extremely high-quality density maps to work. Newer methods aim to automatically interpret poorer and poorer quality maps, using state-of-the-art machine learning and computer vision algorithms.

This chapter begins with a brief introduction to structural biology and x-ray crystallography. This introduction describes in detail the problem of density map interpretation, a background on the algorithms used in automatic interpretation and a high-level overview of automated map interpretation. The chapter also describes four methods in detail, presenting them in chronological order of development. We apply each algorithm to an example density map, illustrating each algorithm’s intermediate steps and the resultant interpretation. Each algorithm’s section presents pseudocode and program flow diagrams. The chapter concludes with a discussion of the advantages and shortcomings of each method, as well as future research directions.