ABSTRACT

One solution to the multitude of file formats for molecular structures is to provide a common program to read and write each file type.1 A good example is Babel or OpenBabel.2 A common data structure, internal to the program, serves as a hub for storing and processing the molecular structure. Components can be added to allow new file formats to be read and written. This approach shares some features with the RDBMS approach. Each molecular file format corresponds to an external representation of the molecular structure and the internal data structure corresponds to the internal representation. In the RDBMS approach, the various file formats are also the external representation of molecular structure, but the common data structure is a schema with tables holding the molecular structure information. The purpose of this chapter is to propose ways to move away from file formats entirely, preserving only the ability to read files formats for legacy data. A later section of this chapter will show how molecule tables in an RDBMS can effectively be used instead of molecular structure files by client programs.