ABSTRACT

This chapter presents a series of data steps: determine the data readers need, locate the data, request the data and clean up the data. Agencies tend to give readers information in a pdf. Data can show things that weren't immediately apparent: connections between gender and income, for instance, or suicides and war veterans. It's hard to extract data from that format. But documents aren't created in pdf, so the data likely exists in some other format. Raw datasets often seem incomprehensible. They are designed to be read by computers, not humans, and by people who work with numbers for a living. Excel defaults to separating data by comma, but since reporter’s data is separated by quotation marks, they need to tell that to Excel. Datasets often come without header rows. That's because they are designed to have information added to them periodically, and header rows could interfere with that.