ABSTRACT

The quality, completeness, and consistency of master data in electronic product catalogs are key elements affecting the correctness of most supply chain processes. Despite awareness of the impact of data quality on business, and more recently on e-Business in particular, the data collected in ICT systems are still imperfect. This is, on the one hand, a consequence of a failure to apply common standards, and on the other hand, the lack of clear guidelines and standardization of certain attributes, especially textual ones. In addition, it is affected by the lack of implementation of effective validation elements in IT systems. There is also a problem related to aggregation of data from various unverified sources resulting from lack of reliable data, and the lack of data quality control due to scale (multi-million products databases). Companies must face the fact that as the business grows, the data problem will grow, and long-term negligence in this area will translate into a costly and lengthy improvement process. A far better solution is to take care of product data from the very beginning and control the process of filling data sets on a continuous basis. The proposal described here is a concept for improving and verifying data using appropriate normalization and validation rules, which, when implemented in an IT system, will support data management and minimize its maintenance costs. It can be particularly useful for large databases aggregated from different sources, or such databases where filling in and updating processes are decentralized—the data is created by a group of companies or an interested community. The proposed solutions have been verified against a product database of Polish manufacturers containing more than 41 million products.