ABSTRACT

Social work case files – assessments, reports, and chronologies – provide rich detail to researchers examining the lives of vulnerable populations and responses to their needs. Digitalisation has expanded research opportunities beyond the traditional case reading exercise, to include the automated analysis of large volumes of unstructured text using ‘big data’ technology, including natural language processing (NLP). These novel methods can yield real-time insights regarding whole populations, but also sharpen the focus on privacy concerns and the safe sharing of sensitive information. In the context of ever-expanding digital stores, data subjects cannot realistically agree on how their personal information is used, yet fully deserve independent scrutiny of decisions which radically affect their lives. This chapter draws on a study that applied NLP to five thousand social work statements lodged in care proceedings in England. Using learning from other fields, it outlines the legal, ethical, and technical challenges surrounding data privacy and management. Key concepts are summarised for social work researchers and data controllers, including anonymisation, de-identification, and re-identification risk. Common misconceptions are dispelled to explain how a risk-based, proportionate approach to the management and sharing of data can be adopted within a social work context.