ABSTRACT

Artificial intelligence (AI)/machine learning (ML) technology has covered almost all fields of business, and research in the past few years. In the future, it is going to cover maximum fields. Almost all business fields are aware that if they do not adopt AI/ML technology there will be an existential threat. The major issue in adopting and improving the performance of AI/ML techniques in Industry 4.0 and research domains is the requirement for high-end infrastructure, i.e., high performance computing (HPC) infrastructure.

HPC infrastructures are heterogeneous due to the combination of central processing units (CPUs) and graphics processing units (GPUs) to exploit the main objectives of performance. In this chapter, we are going to use the modern hybrid computing [CPU+ GPU] HPC infrastructure to overcome the performance issue and optimal resource utilization.

The main objective of this chapter is to progress the performance of different types of AI/ML techniques on hybrid HPC infrastructure. The proposed approach to achieve this goal is through optimal workload distribution on CPUs and GPUs.

To achieve this goal, we test the different AI/ML techniques such as Fully Connected Neural Networks, Convolution Neural Networks, and Recurrent Neural Networks, using Operators and Network parallelization techniques. Then, we develop workload partitioning algorithms to improve the performance of AI/ML techniques based on applications on hybrid HPC infrastructure.