ABSTRACT

The digital advancement of the world means “expeditious growth of a network system and high-speed internet”. It facilitates the users as well as threatens them about security. Data sent over the network can be hacked by malicious users as well as the majority of data is being leaked which compromises the authenticity, confidentiality, and integrity of the message. This leads to authenticating cryptography over the network in data and data leakage prevention (DLP) techniques in various organizations. For this purpose, various security mechanisms such as intrusion detection systems (IDSs), data leakage prevention systems (DLPSs), and firewalls are used to safeguard the data being used. Encircling context and content are being used by DLPs to monitor and safeguard the confidentiality of data. Confidentiality is endangered because it is sorted into false categories and is vulnerable to being disclosed to illegitimate parties. Therefore, many organizations are administering DLPs. Statistical analysis, fingerprinting, and regular expressions are the relying factors for content analysis. This chapter primarily brings out all the issues related to security and categorizes the different cryptography techniques, tools, and keys being used on networks to ensure the confidentiality and consistency of data and presents an improved method to prevent data leakage. For determining related confidential data, statistical analysis is employed to provide a safe mechanism in the domain of data leakage. Semantic-based and centroid-based techniques are utilized for categorizing on the basis of statistical analysis to prevent data leakage. Furthermore, the popular information recollection method named frequency-inverse document frequency (TF–IDF) is applied for the arrangement of documents related to a specific topic. The outcomes have revealed that the proposed centroid-based statistical DLP method can categorize the documents more appropriately in case of extent alteration along with exchanged documents.