ABSTRACT

Most of the current image annotation and retrieval algorithms follow a similar methodology (as shown in Figure 14.1). That is, first take images and create segments (visual tokens or objects) by using image segmentation algorithms or simply an image grid. Second, extract visual features and generate mathematical representation such as a vector for each segment. Third, group segments using clustering algorithms to construct blob tokens. Finally, analyze the correlation between words and blob tokens to discover hidden semantics. Train the system

with large amount of annotated images, and use the learned correlation to predict words for unknown images. This chapter will elaborate on the image annotation tool we have developed.