Long short-term memory (LSTM) networks | 14

ABSTRACT

A long short-term memory network is a specific kind of recurrent neural network architecture that is capable of learning long-range dependencies and broader context. LSTMs are often an excellent choice for building supervised models for text because of this ability to model sequences and structures within text like word dependencies. Text must be heavily preprocessed for LSTMs in much the same way it needs to be preprocessed for dense neural networks, with tokenization and one-hot encoding of sequences. A major characteristic of LSTMs, like other deep learning architectures, is their tendency to memorize the features of training data; we can use strategies like dropout and ensuring that the batch size is large enough to reduce overfitting.