chapter  1
22 Pages

Introduction: research methods for large databases


The main challenges in utilising the advantages offered by these new developments are technical ones. For numerical analysis the historical data, especially the qualitative data, must be codified in an appropriate form. Historical data are prone to missing observations and data input errors, requiring considerable care to avoid mistakes in estimation. It is important to specify the right model and to interpret statistical results correctly, keeping in mind that the results may only be valid under certain explicit or implicit assumptions. The analysis should aim at finding all the patterns concealed within the data, ensuring that what is unexplained is, in some sense, truly random. Instead of relying on generalisations from a range of specific studies, often carried out using different methodologies, it is possible to construct a single coherent account based on evidence of a consistent standard. The book reports the results of new research of this type. The book is intended for use by doctoral and post-doctoral researchers in business history, economic history and social history. The case studies will also appeal to historical geographers and applied econometricians, and the techniques explained in the book are potentially useful to government policy-makers too. This book demonstrates how to create ‘big data’, and, above all, how to exploit it to the full. Many historians only ‘scratch the surface’ of the data they collect. There are often significant patterns hidden in their data that they fail to discover. This book shows how to unlock hidden patterns, and hence get more information out of the data. Unlike conventional statistical texts, this book demonstrates how to put principles into practice with the aid of practical historical studies. This agenda leads, in some cases, to a reappraisal of conventional wisdom on such important issues as the development of the land market, the pricing of commodities, monetary instability, the economic impact of railways, the diffusion of steam technology and the role of women in the economy. Many readers will be familiar with general statistical texts such as Wooldridge (2006). They will also be aware of ‘cliometrics’ literature, as summarised recently in Greasley and Oxley (2011). The case studies in this book build upon previous research in cliometrics. However, despite some similarities, the new research differs from earlier research in important respects. Most cliometrics relies on single equation models and makes limited use of simultaneous equation models, stochastic trends and other concepts featured in the book. There is now more emphasis on qualitative evidence. Early cliometric research emphasised quantification (e.g. using heights as indicators of health and welfare) whereas much of the evidence in modern databases is qualitative. Recent research has tended to combine quantitative and qualitative evidence by using binary variables; this is particularly useful for testing institutional theories of economic change. The book emphasises the importance of testing alternative theories rather than fitting models based on one specific theory. Cliometric research in the 1970s and 1980s tended to react against the Marxist turn in economic history during the 1960s by emphasising the ubiquitous and providential role of market forces.