ABSTRACT

—In this era of Internet, new information appears every day. Researchers keep adding new work to their domains. It is critical to estimate the original effort of authors by analyzing knowledge sources and rating unique content. Hence the need of a plagiarism detector system discussed here, namely Plagiasil. This research highlights the technical scenario by assigning the knowledge base or source as local dataset, internet resources, online or offline books, and research published by various publications industries. It is clear that the research here handles huge amount of data, so it needs an efficient architecture to adapt such tasks successfully. The challenge is in accessing data, mapping content, managing storage, as datasets grow and number of I/O reduce. In order to manage system resources and handle huge datasets, the most important point is to assure accuracy with higher precision and fewer recall. The architecture herewith focusses on scheming an algorithm, which is compliant to the dynamic dataset environment. The dataset grows with updates in latest research resources. One possibility in plagiarism is that some sentences within the document might be copied from various literature resources. Thus, our methodology focusses on decomposition of entire document into sentences.