Value Alignment via Tractable Preference Distance

doi:10.1201/9781351251389-18

ABSTRACT

To create AI systems that either make autonomous decisions or support human decision makers, we must ensure that the systems themselves are aware of the ethical principles that are involved in the decisions. Humans make decisions based on their preferences and the CP-net formalism is a convenient and expressive way to model subjective preferences over decisions with multiple features. However, in many situations, the subjective preferences of the decision makers may need to be checked against exogenous priorities such as those provided by ethical principles, feasibility constraints, or safety regulations. It is essential to have principled ways to evaluate if preferences are compatible with these exogenous priorities. This chapter proposes to model both subjective preferences and exogenous priorities via CP-nets. This allows us to use a notion of distance between the orderings induced by two CP-nets to aid decision makers, defining a value alignment procedure that uses the distance to check if the preferences are close enough to the ethical principles. An experimental evaluation shows that the quality of the decision with respect to the subjective preferences does not significantly degrade when conforming to the ethical principles.