Singapore Institute of Technology
Browse
- No file added yet -

Mitigating Bias in Machine Learning Models for Phishing Webpage Detection

Download (335.9 kB)
conference contribution
posted on 2024-09-17, 02:44 authored by Aditya Kulkarni, Vivek BalachandranVivek Balachandran, Dinil Mon Divakaran, Tamal Das

The widespread accessibility of the Internet has led to a surge in online fraudulent activities, underscoring the neces- sity of shielding users’ sensitive information from cybercriminals. Phishing, a well-known cyberattack, revolves around the creation of phishing webpages and the dissemination of corresponding URLs, aiming to deceive users into sharing their sensitive information, often for identity theft or financial gain. Various techniques are available for preemptively categorizing zero-day phishing URLs by distilling unique attributes and constructing predictive models. However, these existing techniques encounter unresolved issues. This proposal delves into persistent challenges within phishing detection solutions, particularly concentrated on the preliminary phase of assembling comprehensive datasets, and proposes a potential solution in the form of a tool engineered to alleviate bias in ML models. Such a tool can generate phishing webpages for any given set of legitimate URLs, infusing randomly selected content and visual-based phishing features. Furthermore, we contend that the tool holds the potential to assess the efficacy of existing phishing detection solutions, especially those trained on confined datasets.

History

Journal/Conference/Book title

16th IEEE International Conference on COMmunication Systems & NETworkS (COMSNETS)

Publication date

2024-02-16

Version

  • Post-print

Rights statement

© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC