ABSTRACT

This chapter will examine how Python, the Natural Language Toolkit (NLTK), scikitlearn, and Numpy can be used to classify text. Using a corpus of previously classified positive and negative movie reviews, a number of classification models will be developed to predict whether unseen movie reviews can be correctly identified as either positive or negative in nature.