ABSTRACT

The aim of this chapter is not simply to ease such fears but to offer a research program that can actually guarantee that AGIs pose no existential threat. This will be a one-sided bound, staving off some of the worst consequences of Bostrom’s (2012) orthogonality thesis that a high level of intelligence can be in the service of any goal, good or bad alike, and will say nothing about the good, possibly spectacularly good impacts that AGI may have on the future of humanity. In Section 9.2, we argue that no physical interlock or other safety mechanism can be devised to restrain AGIs, and the guarantees we seek are necessarily of a mathematical (deductive, as opposed to algorithmic) nature. This requires some shift in focus, because in the current literature it is not some logical deductive system that is viewed as the primary descriptor of AGI behavior but rather some utility function whose maximization is the goal of the AGI. Yet, as we shall argue,

deductive bounds are still possible: for example, consider an AGI whose goal is to square the circle with ruler and compass-we know in advance that no matter what (static or dynamically changing) weights its utility function has, and no matter what algorithmic tricks it has up its sleeve, including self-modifying code, reliance on probabilistic, quantum, or other hypercomputing methods (Ord 2002), it simply cannot reach this goal.