ABSTRACT

With the creation of new technology, a series of hardware and software have been developed that allow the capture, processing, and analysis, in some cases in real time, of a large amount of information in different settings. the collected data can be up to a thousand data per second in an amount of up to 100–1000 variables that are usually captured at hundreds and thousands of data points per second. These data sets or combinations of data sets whose variability, complexity, volume, and speed of growth hinder their capture, processing, management, or analysis using conventional technologies and tools is called big data. Data mining is a way to identify relevant variables and extract the most representative data from a large amount of information. In sports, this large information represents a significant challenge for the analysis of information in increasingly reduced periods to have them available in the team staff’s day-to-day decision-making process. As a solution, statistical data mining techniques used in other areas such as the Principal Component Analysis (PCA) have been recently proposed in sport science. PCA is probably the most common multivariate statistical technique among all scientific disciplines. It is relatively common for the PCA to show results with some type of bias due to misinterpretations, difficulties in the data management process, or only due to subjectivities in the treatment of the data and the PCA itself. This work aims to expose some methodological consideration to optimally report statistical procedures and results sections in sport science and medicine when using PCA as a data mining technique.