2D and 3D Pose Estimation for Gesture Recognition in Deep-learning-driven Human–vehicle Leader–follower Systems

doi:10.1201/9781032629537-5

Chapter

2D and 3D Pose Estimation for Gesture Recognition in Deep-learning-driven Human–vehicle Leader–follower Systems

ABSTRACT

The three decades of research into leader–follower autonomy (LFA) – in which the movement of one autonomous “leader” is followed by one or more autonomous “followers” – has mostly focused on systems in which vehicles follow other vehicles. In 2022, We demonstrated a human-led LFA system, in which a vehicle autonomously followed a human operator. The human operator is able to control the vehicle using body language, which is recognized by a gesture-recognition system. We had developed a modular pipeline using a combination of pre-built and custom-built components with high performance and high maintainability. The pipeline translates camera frames into pose data, which is then interpreted as a vehicle message. The message is sent to the vehicle, which performs the associated actions. In this work, we present improvements to our pipeline. First, we implemented a bipartite mapping algorithm for more consistent global target persistence to make the system more robust against overlapping detections, such as if a person walks in front of or behind the current operator. This improves operational safety. Second, we migrated the object detection module from YOLOv3 to YOLOv5 Nano, affording us substantially higher performance with much lower hardware requirements. Third, we exchanged the two-dimensional pose estimation network for a three-dimensional pose estimation network. This enables better precision during gesture detection, enabling114 increased detection accuracy, thus improving operation. Our results show that these new components improve the overall performance of the human–vehicle leader-follower system.