ABSTRACT

This chapter considers recent experimental work on the application of deep neural networks to several natural language processing tasks that involve learning to identify hierarchical syntactic structure. It deals with subject-verb agreement, looking at the performance of both supervised long short-term memory (LSTM) and unsupervised LSTM language models. The chapter then briefly reviews the work on more refined test sets for agreement across a variety of syntactic constructions. Bidirectional Encoder Representations from Transformers (BERT) shows a significant improvement in accuracy over LSTM language models on this test set, approaching or surpassing human performance on almost all syntactic categories. Because BERT predicts words on the basis of their left and right contexts, it is not possible to compute the conditional probability that Bert assigns to a word in a sequence solely on the basis of the probabilities of the words that precede it.