ABSTRACT

In this chapter, you will learn the basics of Python, including data structures and data processing, with complete hands-on activities. It gives a clear understanding of NumPy and pandas. pandas is a free Python data analysis and data handling software library. pandas provides a variety of high-performance and convenient data structures along with operations for data manipulation. We will use Python libraries to load, manipulate, analyze, and visualize several datasets. We will also use some of its machine learning (ML) algorithms to create smart models and make predictions. The chapter focuses on exploratory data analysis (EDA), a technique that evaluates datasets to contextualize their primary characteristics, often with visualization. The dataset can be described using the describe function; furthermore, we have to prepossess and clean the data and finally find the relationships between variables. k-Means clustering, agglomerative clustering, and density-based spatial clustering of applications with noise (DBSCAN) clustering are the three common clustering methods covered in the chapter. Different ML algorithms such as k-nearest neighbors (KNN), random forest, decision tree, and support vector machine (SVM) are also discussed. To end the section, we will also get an overview of different integrated development environments (IDEs) that can be used for Python in data science.