Data cycle step 4: Data processing and curation | 5

ABSTRACT

This chapter addresses the actual processing of data once they are captured or created. In many cases, end-to-end pipelines may exist for the type of data processing readers need. Many workflows will have originally been developed by computer or data scientists and scientific programmers. The chapter deals with the hypothetical situation that workflows exist. Questions to be answered in the process of choosing an existing workflow, engine, or service include considerations about the ease of custom developments. Developing a new workflow for data processing and curation should be avoided where possible. Sending data around is obviously associated with all kinds of technical and security issues. A full directory of all tools, workflows, and data collections used in data stewards' experiments should be available to all members of the research team at all times. The chapter also addresses issues that are specific to workflows data stewards may run on data for (pre-)processing and curation of data.