A Study on the Document Similarity Judgment using Similar Block Expansion

doi:10.1201/9780429070655-56

ABSTRACT

It is very difficult and troublesome to judge piracy when evaluating students’ homework or documents using Internet and computer. In particular when they write text regarding the same theme, it is not easy to judge if it is pirated or not. It is different issue from existing information search methods which look for the most appropriate clustering after abstracting key words in other words abstracting frequency of index words from the target document. Therefore we used string which classifies with space rather than words as an index, applied location vector with appearance frequency and then expanded it to block to judge the similarity of the blocks. It is the ‘similar string block expansion’ method. In this article, we studied the method to evaluate the piracy of the document by calculating the reference data according to piracy similarity in a short time.