Asymmetry | 18 | Clustering in Bioinformatics and Drug Discovery

ABSTRACT

Comparisons of objects with diﬀerent numbers of nodes, length of strings, size or shapes creates a possible breakdown in the meaning or utility of a symmetric distance or similarity measure, like the Euclidean distance or the Tanimoto similarity. A very simple example from the social sciences regards clustering a larger group of individuals into a number of social cliques. Individuals rank their feelings or perceptions of one another, say, on a scale from 1 to 10. It is easy to see how a proximity matrix of such values is asymmetric since the pairwise values between individuals may diﬀer - Ida likes Joey (the (Ida, Joey) entry), but Joey just tolerates Ida (the (Joey, Ida) entry). Such data can then be clustered to ﬁnd groups of like-minded individuals (social cliques). Another simple example involves shapes. Two objects of the same shape, two triangles, might be quite “similar” to an observer, but a symmetric measure would measure them as quite dissimilar, if the two triangles are quite diﬀerent in size. Thus, any asymmetric measure is one in which the order of the comparison of two objects may result in the a measure having diﬀerent values. In Figure 8.1, there are two objects (in (a) and (b)) being compared. A similarity measure would ﬁnd these objects to be quite “diﬀerent” if their size diﬀerence is included in the similarity comparison. However, the fact that the triangle in Figure 8.1b can “ﬁt into” the triangle in Figure 8.1a may be an interesting way to group these objects together. An asymmetric measure would have a high value for the case of Figure 8.1b being compared with Figure 8.1a and an low value for the case of the object in Figure 8.1a “ﬁtting” into the triangle depicted in Figure 8.1b.