Use of Machine Learning and a Natural Language Processing Approach for Detecting Phishing Attacks

doi:10.1201/9780367808495-9

Chapter

Use of Machine Learning and a Natural Language Processing Approach for Detecting Phishing Attacks

ABSTRACT

Phishing attacks are today a major threat to the security of the systems. And also, there is no foolproof protective system against these attacks. Phishing is one of the different cybercrimes. In this, the person interested to attack behaves like another existing individual/organization and he uses e-mails or similar types of techniques. It also can be described as appearing as a person or organization that one can trust and then to acquire certain private and significant data without the knowledge of the person concerned, such as sign-in credentials and bank-related card details, for fraudulent reasons. Even from chatting to banking, a huge community uses the online services through online transactions. Phishing attacks are happened by the execution of certain actions such as mouse clicking and hovering on malicious URLs. The attacker may also use phishing, by providing links which are malicious in nature through emails that can be used to capture login credentials, victim account information, etc. Therefore, we have to enhance the security mechanism.

In this book chapter paper, types of phishing attacks are explained. It also focuses on the anti-phishing URL tool which is used to prevent phishing attacks. The main objective of this book chapter is to explain initially the characteristics of phishing attacks. There are some uniqueness and patterns associated with the websites which are used for phishing. 226There properties can be used to detect phishing. Then these attacks are detected by a hybrid machine learning model. The system has been implemented by examining the URLs used in phishing attacks with some extracted features before opening them. Some natural language processing (NLP) techniques are used in the proposed machine learning system. These techniques are used for analyzing the text semantically to detect malicious intentions which indicate phishing attacks. In order to identify the websites for their legitimacy, some machine learning algorithms (LAs) are also discussed in this book chapter. It also focuses on the Naive Bayes (NB) classifier, K-Means clustering to calculate the possibility of the website as valid phish or invalid phish.