ABSTRACT

People rarely talk about how difficult it is to do data science the right way. This chapter looks at three fundamental difficulties. “Iceberg of details”—even questions that appear simple require the data scientist to make a lot of decisions before they produce the answer. “Domino of mistakes”—data analysis is a complex and sequential process. Getting things wrong at any one step of the process potentially renders the whole analysis useless. “No second chance”—even a critical mistake in data analysis can easily go unnoticed. In conclusion, you should be skeptical about the results of a data analysis and inquire as to what has been done to verify the results.