ABSTRACT

It is now generally accepted that French and Spanish are both descended from Latin. But is it possible to specify in any objective manner which of these is closer to Latin and by how much? In this chapter,1 we look at the general problem of deriving (dis)similarity measures between natural languages. Obviously, our results are not intended to be conclusive assertions about such issues, but are only meant to serve as impetus for further study of subgrouping techniques. The method employed is to first form an inductive hypothesis explaining the derivation of words in a child language from those of its parent. The inductive hypothesis is suitably encoded along with its exceptions in observational data, in our case using Probabilistic Finite State Automata (PFSA). We assume that more complex automata are needed to explain the phonological changes between less similar languages. Thus a measure of the complexity of the PFSA would be indicative of the amount of phonological dissimilariy between the languages under consideration.