ABSTRACT

The era of big data is marked by the enormous amount, speed, and breadth of data that are made available with rapidly advancing technologies in computation, detection, storage, and analysis. In healthcare, big data include medical records, prescription information, medical images, laboratory results, genomic profiles, demographics, and more. The opportunities to improve individual and population health are immense. In oncology, the National Cancer Institute (NCI) had launched pioneering programs in the 2000s to harness big data in support of laboratory and clinical research. Despite their laudable goals, these efforts did not make significant inroads to reap the benefits of big data. The fundamental challenge lies in the large amount of unstructured clinical information that is not readily amenable for electronic analysis. For medical oncology, efforts are now focused on the extraction and reformatting of such clinical data so that they can be queried for analysis from central locations. Progress is being made with data from hundreds of thousands, or even millions, of patients. Somewhat different approaches are being developed for radiation oncology. The Oncospace program from Johns Hopkins University specifically captures structured data during patient encounters; similar to the EuroCat program from Maastro, the researchers send analyses to external databases to retrieve results to avoid the burden of data export. Both radiation oncology programs have been able to produce outcome prediction models that are more effective than those derived by the human practitioners, although the number of patients were less than 500, thus, relatively small. Efforts in oncologic clinical research using big data are clearly in their infancy. Nevertheless, they have provided important insights about the scope of the challenge. It seems only a matter of time that clinical research with big data will have a major impact on the community.