ABSTRACT

Content-based image retrieval (CBIR) and classification algorithms require features to be extracted from images. Global and low level image features such as color, texture, and shape fail to describe pattern variations within regions of an image. Bag of Visual Words approaches have emerged in recent years that extract features based on local pattern variations. These approaches typically outperform global feature methods in classification tasks. Recent studies have shown that Word N-Gram models common in text classification can be applied to images to achieve better classification performance than Bag of Visual Words methods as it results in more complete image representation. However, this adds to the dimensionality and computational cost. State of the art Deep learning models have been successful for image classification. However, huge training data required for these models is a big challenge. This book chapter reviews the literature on Bag of Visual Words and N-gram models for image classification and retrieval. It also discusses few cases where the N-gram models have outperformed or given comparable performance to the state of the art Deep Learning Models. The literature demonstrates that N-grams is a powerful and promising descriptor for image representation and is useful for various classification and retrieval applications.