ABSTRACT

Researchers in criminology and criminal justice have been making increasing use of the machine learning approach to investigate questions involving large amounts of digital data. We make use here of survey data on over 220,000 respondents drawn from three waves of the National Crime Victimization Survey Identity Theft Supplement (NCVS-ITS) conducted by the Bureau of Justice Statistics (BJS) in 2012, 2014, and in 2016. We use three distinct machine learning algorithms to analyze these data: 1) logistic regression; 2) decision tree; and, 3) random forest. We assess the efficacy of these approaches against these evaluative criteria: the overall percentage of correct classification, receiver operating characteristics (ROC), the area under the ROC curve (AUC), and feature criticality. Our findings indicate that the logistic regression algorithm performs best in predicting overall identity theft victimization, misuse of credit cards, misuse of financial accounts of other types, and the opening of new accounts; the random forest algorithm performs best in predicting misuse of checking/saving accounts. Our findings suggest that the respondent’s age, educational level, and online shopping frequency are significantly related to identity theft victimization. Additionally, frequently checking credit reports and changing passwords of financial accounts are strong predictors of identity theft victimization. We draw out the implications of our work for our collective understanding of identity theft, and for informing our judgment as to the potential utility of the use of machine learning approaches in criminology and criminal justice.