ABSTRACT

Geospatial stream processing refers to a class of software systems for processing high volume geospatial data streams with very low latency, that is, in near real time. Owing to the limitation of database management systems (DBMSs), data stream management systems (DSMSs) are oriented toward processing large data streams in near real time. Despite the differences between these two classes of management systems, DSMSs resemble DBMSs: they process data streams using SQL, SQL-like expressions, and operators defined by relational algebra.

Geospatial data streams demonstrate at least two Big Data core features, volume and velocity. Increasingly, a dominant approach is to leverage in-memory computing over a cluster of commodity hardware. Existing distributed in-memory query engines and their processing models are predominantly based on relational paradigms and continuous operator models, without explicit support for spatiotemporal queries.

Data management systems and technologies have drastically improved the availability of data analysis capabilities. Two major reasons for the widespread use of database systems are data independence, separating physical representation and storage from the actual information, and declarative languages, separating the program specification from its execution environment.

In the era of Big Data, it is important to ensure that well-established declarative language concepts make their way into the advanced analytics of geospatial data streams.

This chapter gives an insight into geospatial stream processing at a conceptual level, that is, from the user perspective, and presents a framework for efficient real-time analytics of big geospatial streams based on distributed processing on large clusters and a declarative, SQL-based approach.