ABSTRACT

Learning goal: You can search and fetch database records from NCBI via Biopython. 20.1 IN THIS CHAPTER YOU WILL LEARN

• How to read sequence les from the web

• How to submit PubMed queries

• How to submit queries to the NCBI nucleotide database

• How to retrieve Uniprot records and write them to a le

20.2.1 Problem Description

In the previous chapter, you used Biopython to manipulate local sequence les (e.g., FASTA and GenBank les). In this chapter, you will use Biopython to access online NCBI databases, such as PubMed and GenBank, and Expasy resources, such as Uniprot, and retrieve and parse their contents. e following Python session shows how to nd publications about PyCogent, a Python library complementary to Biopython. First, PubMed entries containing the keyword “PyCogent” need to be found and retrieved, and the

resulting records need to be parsed. Since PubMed is one of the NCBI databases (www.ncbi.nlm.nih.gov/), it is connected to the Entrez data retrieval system (www.ncbi.nlm.nih.gov/Entrez). See Box 20.1 for sample queries to the NCBI server. e Biopython module to access NCBI web services is called Entrez as well. e Entrez module is needed to access and download NCBI database records. To further parse publication records, you need a specialized parser from the Bio.Medline module.