filmov
tv
Phishing Website Detection Project | Machine learning | Classification Models | Python
Показать описание
Project abstract:
Phishing websites create a serious risk to internet users by exposing personal information through fake emails, texts, phone calls, or websites that try to trick users into sharing personal information. To protect users from identity theft and cyberattacks, it is critical to identify these phishing sites. We explore the use of machine learning algorithms to automatically identify phishing websites to enhance online security and protect user privacy. This project aims to develop a machine learning-based approach for detecting phishing websites based on the URL. We have utilized the Phishing Websites dataset taken from the source UCI Machine learning repository, it comprises 11,055 instances, each characterized by 30 features including having_ip_address, url_length, having_at_symbol, and more. we have considered the top 10 features using the selectKBest method and The dataset is formatted in ARFF (Attribute-Relation File Format), a widely used format for representing data with attributes and their corresponding values. Three supervised machine learning models are implemented in this study: Decision Tree, Random Forest, and Support Vector Classifier(SVC) for the classification into phishing or normal websites. The three algorithms have been selected due to their better understanding and better fit for the dataset. The results of the experiments indicate that the Random Forest has the highest accuracy (94.4%) when compared to the other models, the accuracy for the Decision Tree model (94.4%), and the lowest accuracy (93.79%) for the SVC model. Considering the results closely, we find that the Random Forest and SVC algorithms perform better, particularly when looking at the Area under the Curve (AUC) as 99%.
Phishing websites create a serious risk to internet users by exposing personal information through fake emails, texts, phone calls, or websites that try to trick users into sharing personal information. To protect users from identity theft and cyberattacks, it is critical to identify these phishing sites. We explore the use of machine learning algorithms to automatically identify phishing websites to enhance online security and protect user privacy. This project aims to develop a machine learning-based approach for detecting phishing websites based on the URL. We have utilized the Phishing Websites dataset taken from the source UCI Machine learning repository, it comprises 11,055 instances, each characterized by 30 features including having_ip_address, url_length, having_at_symbol, and more. we have considered the top 10 features using the selectKBest method and The dataset is formatted in ARFF (Attribute-Relation File Format), a widely used format for representing data with attributes and their corresponding values. Three supervised machine learning models are implemented in this study: Decision Tree, Random Forest, and Support Vector Classifier(SVC) for the classification into phishing or normal websites. The three algorithms have been selected due to their better understanding and better fit for the dataset. The results of the experiments indicate that the Random Forest has the highest accuracy (94.4%) when compared to the other models, the accuracy for the Decision Tree model (94.4%), and the lowest accuracy (93.79%) for the SVC model. Considering the results closely, we find that the Random Forest and SVC algorithms perform better, particularly when looking at the Area under the Curve (AUC) as 99%.
Комментарии