Social Choice and the Value Alignment Problem
                           *

doi:10.1201/9781351251389-21

ABSTRACT

Social choice is the study of aggregating information from individuals to form a group decision. Value alignment is the task of aligning the behaviors of artificial intelligence (AI) with the values of humans and possibly other sentient entities. Given there is no universal agreement among even humans on ethical values, social choice is a necessary tool to address the value alignment problem. This chapter reviews important results in social choice, including Condorcet's paradox, Arrow's theorem, May's theorem, Condorcet's jury theorem, the Rae–Taylor theorem, and Sen's paradox. Additionally, this chapter uses approval voting and the four-stage sequence of John Rawls and Kenneth Arrow to provide a framework for using social choice for value alignment. Furthermore, this chapter examines the work of Nicolas de Condorcet, the eighteenth-century French mathematician and philosopher who was arguably the first to mathematically model an intelligence explosion hypothesis. Examination of his mathematical and political philosophy writings provide a starting point for a discussion on what background conditions should exist when voters vote to align the behavior of an AI. Specifically, his work suggested that there should be institutions which maximize the number of voters who are honest, altruistic, knowledgeable, and independent thinking.

Social Choice and the Value Alignment Problem *

ABSTRACT