Feature Engineering for Data Streams

doi:10.1201/9781315181080-5

ABSTRACT

This chapter provides an overview of streaming feature engineering. It focuses on the steps of feature construction and selection of feature engineering. The chapter summarizes the typical streaming settings and their corresponding formal definitions. It reviews automated feature construction algorithms including linear and non-linear methods. The chapter provides an overview of feature selection algorithms with different streaming settings. It discusses some open questions and possible research directions of feature engineering for data streams. The chapter considers the setting of streaming instances with fixed features and the setting of streaming features with fixed instances. Principal Component Analysis, aiming to maximally preserve the data variance, has been widely used in many real-world applications such as dimension reduction, signal denoising, and correlation analysis. Linear discriminant analysis is powerful for supervised dimensionality reduction and has been applied successfully to many applications, including machine learning, data mining, and bioinformatics.