Machine learning and sample splits | 12 | False Feedback in Economics

ABSTRACT

Machine learning methods have been so successful because we mostly apply them to practical tasks, where algorithms can profit from true feedback, such as in self-driving cars. Machine learning methods are best suited for tasks where the question about what constitutes relevant input data able to map the output data is unambiguous. The only concern of machine learning in such settings is overfitting. One approach to combat this problem is sample splits. The chapter argues that economists could adapt sample split approaches, too, as it would allow for in-depth exploratory searches of the data, which, when applied to the test sample, can still be subject to valid p-values. The loss in power due to the smaller sample sizes is much lower than adjustments for multiple comparisons would require it.