ABSTRACT

Data warehousing technologies have become mature enough to efficiently store and process huge data sets, which has shifted the data warehousing challenge from increasing data processing capacity to enriching data resources in order to provide better decision-making assistance. There have been reports that some organizations intend to recruit Web data into data warehouse systems as a means of responding to the challenge of enriching data resources, because infinite information has made the Internet the largest external database to each organization. However, there is not a systematic guideline to support such an intention. To fill this void, we introduce Web integration as a strategy to merge data warehouses and the Web, with an emphasis on effectively and efficiently acquiring Web data into data warehouses. We also point out that the critical step for Web integration is to acquire genuinely valuable business data from the Web. A framework for determining the business value of Web data is offered to facilitate Web integration efforts.