Data Convergence for   High-Performance Cloud

doi:10.1201/9781003176664-9

Chapter

Data Convergence for High-Performance Cloud

ABSTRACT

The distributed computing landscape has been undergoing radical changes: high-performance computing (HPC) applications are moving to the cloud, as a way of simplifying development, deployment, and migration across computing systems. Meanwhile, cloud applications are becoming increasingly complex and computationally intensive, with the advent of high-performance data analytics pipelines – highly distributed workflows dealing with enormous and diverse datasets, heavily relying on virtualization and containers, which would benefit from technologies used in “traditional” HPC. The foreseeable convergence demands new abstractions to cope with the increased heterogeneity, so that diverse workload classes can coexist seamlessly on the same infrastructure. In this chapter, we propose a unified storage layer, to enable cloud-native applications to transparently access a wide spectrum of storage solutions, ranging from high-performance filesystems to cloud-based object stores and key–value databases. Allowing software to exploit the best out of the storage services available with no need for manual intervention from users or programmers, simplifies development and execution of workflows, and boosts the overall productivity. This work has been motivated by the requirements of real-world, industry-driven applications running on Kubernetes, the industry standard for cloud infrastructure orchestration.