What’s Hiding in My Deep Features?

doi:10.1201/b22524-7

ABSTRACT

Input variations are often irrelevant to the desired target output of a recognition system, for example, face images of the same person that differ in pose or facial expression should be ascribed the same identity. Deep neural networks are often hailed for their ability to learn representations that are invariant across a wide range of input variations. This invariance is often assumed from overall performance, but research has demonstrated the contrary, that is, that deep neural networks are sensitive to out-of-plane rotations. Simultaneously, some approaches implicitly rely on noninvariant properties, for instance, using a truncated network trained on facial identities to arrive at a representation used to classify facial attributes. In this chapter, we study noninvariant properties of large face-recognition networks in detail, demonstrating that networks trained on face identities work quite well for classifying not only identity-related attributes, but can also effectively classify attributes that are only slightly correlated or uncorrelated with face identity. Attributes related to facial expression or accessories, for example, are those that we would expect a truly invariant identity-trained network to attenuate or ignore. However, noticeable information about facial expression and accessories is still contained within the penultimate layer of the network, as is noticeable information regarding pitch, roll, and yaw. Noninvariant properties of a feature space need not preclude good recognition performance in an end-to-end network, provided that classes effectively separate/saturate at the final decision output. Using a noninvariant feature space derived from a truncated network can also allow generalization to novel tasks (e.g., attribute prediction or face verification from a face-identity trained network.) However, there are some situations in which an invariant representation is desirable, for example, when enrolling new identities in a recognition system. If the feature space is noninvariant, just because identities in the training set were able to separate/saturate well, there is no guarantee that newly enrolled samples or identities will have a similar property. To this end, and on a more theoretical note, we induce variations on the MNIST dataset and augment the LeNet architecture by adding several invariance constraints on the feature space in the objective function. Analysis suggests that although these steps do lead reduced variance in some characteristics, arriving at a truly noninvariant feature space may be difficult to accomplish.