ABSTRACT

In many modern biological research environments, science is driven by data-intensive applications and the rate of generating data is no longer limiting. Rather, the processes of transforming those high volumes of data into applied information, knowledge, and wisdom have become analytical bottlenecks.1 This transformative process relies on computational systems and domain knowledge of trained scientists. However, the size of these datasets often scales beyond the computational resources easily available to many research groups, and the requisite knowledge to utilize the most effective hardware and software is not always readily available.2 Thus, while many research groups can easily generate large datasets, analyzing them is often difcult and time consuming. Parallel to this problem are both the ability of researchers to quickly and easily integrate publicly available data with their prepublication restricted data and their ability to make their data publicly available post-publication.3 Fortunately, substantial advances have been made to provide cyberinfrastructure4 (CI; computing resources and knowledge to use them) for life science researchers.