ABSTRACT

This chapter discusses computing techniques relating to the reverse way of thinking: Speculation → Data. The purpose of data science is to turn data into usable information. There are many different kinds of cancer, often given the name of the tissue in which they originate: lung cancer, ovarian cancer, prostate cancer, and so on. Many of the key principles needed to develop the capacity to simulate come straight from computer science, including aspects of design, modularity, and reproducibility. A data scientist, as part of a team of biomedical researchers, might take on the job of compiling data from many microarray assays to identify whether different types of cancer are related based on their gene expression. To illustrate, consider a rather simple data reduction technique for the NCI60 microarray data. NCI60 has been rearranged into narrow format in Spreads, with columns Probe and spread for each of 32,344 probes.