ABSTRACT

Feature engineering plays a vital role in big data analytics. Machine learning and data mining algorithms cannot work without data. Little can be achieved if there are few features to represent the underlying data objects, and the quality of results of those algorithms largely depends on the quality of the available features. Feature Engineering for Machine Learning and Data Analytics provides a comprehensive introduction to feature engineering, including feature generation, feature extraction, feature transformation, feature selection, and feature analysis and evaluation.

The book presents key concepts, methods, examples, and applications, as well as chapters on feature engineering for major data types such as texts, images, sequences, time series, graphs, streaming data, software engineering data, Twitter data, and social media data. It also contains generic feature generation approaches, as well as methods for generating tried-and-tested, hand-crafted, domain-specific features.

The first chapter defines the concepts of features and feature engineering, offers an overview of the book, and provides pointers to topics not covered in this book. The next six chapters are devoted to feature engineering, including feature generation for specific data types. The subsequent four chapters cover generic approaches for feature engineering, namely feature selection, feature transformation based feature engineering, deep learning based feature engineering, and pattern based feature generation and engineering. The last three chapters discuss feature engineering for social bot detection, software management, and Twitter-based applications respectively.

This book can be used as a reference for data analysts, big data scientists, data preprocessing workers, project managers, project developers, prediction modelers, professors, researchers, graduate students, and upper level undergraduate students. It can also be used as the primary text for courses on feature engineering, or as a supplement for courses on machine learning, data mining, and big data analytics.

chapter 1|12 pages

Preliminaries and overview

ByGuozhu Dong, Huan Liu

part I|176 pages

Feature Engineering for Various Data Types

chapter 2|40 pages

Feature Engineering for Text Data

ByChase Geigle, Qiaozhu Mei, ChengXiang Zhai

chapter 3|31 pages

Feature Extraction and Learning for Visual Data

ByParag S. Chandakkar, Ragav Venkatesan, Baoxin Li

chapter 4|30 pages

Feature-Based Time-Series Analysis

ByBen D. Fulcher

chapter 5|27 pages

Feature Engineering for Data Streams

ByYao Ma, Jiliang Tang, Charu Aggarwal

chapter 6|22 pages

Feature Generation and Feature Engineering for Sequences

ByGuozhu Dong, Lei Duan, Jyrki Nummenmaa, Peng Zhang

chapter 7|22 pages

Feature Generation for Graphs and Networks

ByYuan Yao, Hanghang Tong, Feng Xu, Jian Lu

part II|119 pages

General Feature Engineering Techniques

chapter 8|30 pages

Feature Selection and Evaluation

ByYun Li, Tao Li

chapter 10|33 pages

Pattern‐Based Feature Generation

ByYunzhe Jia, James Bailey, Ramamohanarao Kotagiri, Christopher Leckie

chapter 11|29 pages

Deep learning for feature representation

BySuhang Wang, Huan Liu

part III|85 pages

Feature Engineering in Special Applications

chapter 12|24 pages

Feature Engineering for Social Bot Detection

ByOnur Varol, Clayton A. Davis, Filippo Menczer, Alessandro Flammini

chapter 13|24 pages

Feature Generation and Engineering for Software Analytics

ByXin Xia, David Lo

chapter 14|35 pages

Feature engineering for twitter-based applications

BySanjaya Wijeratne, Amit Sheth, Shreyansh Bhatt, Lakshika Balasuriya, Hussein S. Al-Olimat, Manas Gaur, Amir Hossein Yazdavar, Krishnaprasad Thirunarayan