ABSTRACT

CONTENTS 27.1 Introduction 595 27.2 Parallel Algorithms for Data Mining 598

27.2.1 Parallel Data Clustering Algorithms 598 27.2.2 Parallel Data Classification Algorithms 600 27.2.3 MapReduce Framework 602

27.3 Distributed Algorithms for Data Mining 604 27.3.1 Distributed Clustering 604 27.3.2 Distributed Classification 606 27.3.3 Data Mining in P2P Environments 607

27.3.3.1 Approximate Algorithms for P2P Data Mining 607 27.3.3.2 Exact Algorithms for P2P Data Mining 608

27.3.4 Message Passing Interface 608 27.4 Data Mining in GRID Environments 609 27.5 DDM for Astronomy 610 27.6 Conclusion 611 Acknowledgment 611 References 611

27.1 INTRODUCTION Due to advances in data collection capabilities, storage, and computing technologies, astronomy has become a data-rich discipline. Over the last few years, numerous surveys have systematically looked at the entire sky using different wavelengths of light and sophisticated instruments. The data generated from many of these surveys exceed several terabyte, and

often reach peta byte scales. Brunner et al. [18] present an overview of some of these massive astronomy datasets available to the researchers. All these data are meaningless unless they are analyzed for new and interesting astronomical discoveries.