ABSTRACT
AI systems have grown more competent and general as the field of deep learning has matured. Reasoning about the behavior and internal structure of such systems can be challenging, especially since some failure modes arise only once an AI system is sufficiently sophisticated. This chapter discusses some of the fundamental technical challenges around monitoring, robustness and control of AI systems. Current AI systems lack transparency and can exhibit surprising emergent capabilities. They are vulnerable to adversarial examples, Trojans and other attacks. These challenges in turn may make it hard to control AI systems and prevent undesirable behaviors such as deception. When conducting research to advance AI safety, it is important to consider the risk of inadvertently accelerating AI capabilities and thereby undermining the overall goal of better understanding and controlling AI systems.
