ABSTRACT

The website phishing debacle continues to dominate discussions in academia and the cybersecurity industry, despite several proposed state-of-the-art conceptualizations to mitigate this trend.

The development has become prominent in the age of high Internet penetration when innocent users throng the Internet for legitimate reasons but oblivious of the malicious tendencies of criminals who mimic URLs and website domains to make unsuspecting audiences vulnerable to cybercrimes.

While predictive analytics-based solutions continue to dominate cybersecurity studies with respect to detecting phishing tendencies, studies seldom consider descriptive statistical analyses of feature attributes prior to modeling conceptual frameworks.

Therefore, this study is motivated by the aforementioned to establish the most prominent attributes from a Mendeley phishing website database released recently. The information gain analysis of the dataset returns five most prominent independent variables, which are used to train naive Bayes and a neural network.

The experimental results of the statistical analysis showed that the slash (/) character is the most discriminative attribute with a strong positive correlation with the ground truth. Malicious phishing websites are observed to contain more dots (.) and slash (/) characters, as well as a higher directory length.