ABSTRACT

Many students and inexperienced analysts naively think that being a successful data analyst is all about learning sophisticated statistical and machine learning techniques. However, it is much more than this and involves having a good feel for what data represents, together with a clear sense of what is trying to be achieved and how this should be done. Also, it is important that analysts be able to present data and results in such a way that others (who may not be numerically literate) can understand key facts and results. In this chapter, these and other important operational issues are discussed in detail, with example code presented to show how R can be applied to address common problems that are often encountered in soccer analytics. In particular, the chapter shows how random forests can be used to identify the relative importance of predictor variables. In addition, the concepts of p-values, effect size, and causality are discussed in detail.