ABSTRACT

Dataminingisgenerallydefinedastheanalysisof(large)observationaldatasetstofindunsuspectedrelationshipsandtosummarizethedatainnovelwaysthatarebothunderstandableanduseful tothedataowner(Handetal.2001,Giudici2003,WittenandFrank2005,HanandKamber2006).It isayoungandinterdisciplinaryfield,drawingfromdisciplinessuchasdatabasesystems,datawarehousing,machinelearning,statistics,signalanalysis,datavisualization,informationretrieval,and high-performancecomputing.Andratherthancomprisingaclear-cutsetofmethods,theterm“data mining”referstoaneclecticapproachwherechoicesareledbypragmaticconsiderationsconcerningthe problemathand.Datamininghasbeensuccessfullyappliedindiverseareassuchasmarketing,finance, engineering,security,games,andscience.Itiso·enappliedinthecontextofknowledgediscoveryin databases(KDD)(Fayyadetal.1996a,b),which,looselydefined,isaprocesswithfourstages:(i)selectingtherelevantdatasourcestoaddresstheknowledgediscoveryquestionsathand,(ii)preprocessing (integrating,cleaning,filtering,and,ifnecessary,transforming)datafromthesesources,(iii)applying dataminingtechniquestoextractpotentiallyinterestingstructuresfromthedata,and(iv)interpreting, validating, and appraising the discovered structures, and presenting them to end users.