ABSTRACT

In recent years, big data analytics that utilizes all relevant data available to uncover possible hidden clinical benefits of certain treatments under investigation has received much attention in biomedical research. Big data analytics are usually performed based on meta-analysis by combining a number of (independent) studies. The validity of a meta-analysis in big data analytics, however, has been challenged due to the fact that (1) there may be heterogeneity across data sets (studies) from various sources either structured or non-structured and (2) there is a possible selection bias driven by accepting more positive studies into the big data center (Chow and Kong, 2015). For example, in a meta-analysis, it is a concern that there is a selection bias (i.e., only studies with positive results are included). In practice, however, the percentage of positive studies (from the big data center) in the meta-analysis is usually unknown. As a result, the findings from the meta-analysis may be biased and hence misleading. In this article, following the concept of reproducibility probability by Shao and Chow (2002), a statistical method for assessing the treatment effect is proposed by taking the unknown percentage of positive studies into consideration for validity of meta-analysis in big data analytics.