ABSTRACT

Bioinformatics databases, like most databases, are subject to many problems. Bioinformatics databases contain data that do not conform to the original intent of the databases, or “dirty” data. However, due to the nature of biological data, usually these problems become complex, thereby excluding the more traditional methods for solving “dirty” data problems. Conceptually, a framework for cleaning and integrating biological data would need to address both the schema and the data issues that cause data quality problems within a bioinformatics database [302].