ABSTRACT

Containers are widely utilized in cloud, as they can provide compatibility and portability to application execution environments by encapsulating programs with their dependencies in isolated environments. Artificial Intelligence (AI) applications often incorporate a complex stack of software packages which can benefit from containerization immensely. Big data analytics push the development of AI applications, making them more computation-intensive or data-intensive. There is a growing interest in executing AI applications in HPC clusters that are conventionally applied for large-scale engineering, scientific and financial simulations. This chapter presents a hybrid architecture consisting of a cloud cluster and an HPC cluster, which has been proposed in the EU-funded research project CYBELE. A login node bridges the two clusters and provides a unified interface for job submission. More specifically, via the login node, long-running service programs are scheduled to be hosted on the Cloud cluster. Per contra, AI applications are scheduled to run on the HPC cluster where their performance can be significantly improved. Furthermore, the methods about parallelization and deployment of containerized AI applications on HPC systems are described.