ABSTRACT

Biomedical research is increasingly data intensive and computational, and “big data science” is migrating into the clinical arena. Teaching hospitals are making substantial investments in DNA sequencing capacity, and some now advertise their use of sequencing to help guide medical treatment. Institutional review boards (IRBs) and research ethicists need to understand and evaluate the analytical approaches used in big data research. Consider, for instance, the phenomenon of imputation, which involves computationally inferring missing data points. Novel programming approaches may challenge the Food and Drug Administration, IRBs, and ethicists. Machine learning adds layers of complexity to analytical processes and to the ethical and regulatory issues. In machine learning, the program, rather than the programmer, builds the predictive model. Computational scientists, ethicists, and regulators will need some agreement on how to validate these models and determine when they are robust enough to enter human trials.