ABSTRACT

Data Flow Distribution Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 11.3.2 Stream Processes in SCSQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414

11.4 Streaming Function Approximation for Scientific Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 11.4.1 Survey of Existing Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 11.4.2 Technology Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419

11.4.2.1 Local Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420 11.4.2.2 The ISAT Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420

11.4.2.3 Indexing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 11.4.2.4 An Example: A Binary Tree Index . . . . . . . . . . . . . . . . . . 423

11.4.3 Deployment Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 11.4.4 Future Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426

Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426

Modern scientific instruments such as satellites, on-ground antennas, and simulators collect large volumes of data. For example, instruments monitoring the environment emit streams of environmental sensor readings, particle colliders produce streams of particle collision data, and software telescopes such as LOFAR33 produce very voluminous digitized radio signals. The measurement data is normally produced as streams rather than formats stored in conventional database tables. A stream has the property that data is ordered in time, and the data volume is potentially unlimited. Scientists perform a wide range of on-line analyses over the data streams. A conventional approach to data management using a relational database management system (DBMS) has the disadvantage that streaming data has to be loaded into a database before it can be queried and analyzed. If the data rate of a stream is too high, it will be impossible for the DBMS to load the streaming data fast enough. This creates backlogs of unanalyzed data, and the high data volume produced by scientific instruments can even be too large to store and process.2 Furthermore, offline data processing prevents timely analysis of interesting natural events as they occur.