ABSTRACT

Deep learning models can be effective for text prediction problems because they use these multiple layers to capture complex relationships in language. The layers in a deep learning model are connected in a network, and these models are called neural networks, although they do not work much like a human brain. Text data requires extensive processing to be appropriate for modeling, whether via an algorithm like regularized regression or a neural network. Feature engineering can sometimes be a part of the machine learning process where subtle data leakage occurs, when practitioners use information that will not be available at prediction time. Computing inverse document frequency involves the whole corpus or collection of documents. The tidymodels framework is designed to encourage good statistical practice, such as learning feature engineering transformations from training data and then applying those transformation to other data sets.