<p dir="ltr">Phishing attacks attempt to deceive users into stealing sensitive information, posing a significant cybersecurity threat. Advances in Machine Learning (ML) and Deep Learning (DL) have led to the development of numerous phishing web page detection solutions, but these models remain vulnerable to adversarial attacks. Evaluating their robustness against adversarial phishing web pages is essential. Existing tools contain datasets of pre-designed phishing web pages for a limited number of brands and lack diversity in phishing features. <br>To address these challenges, we develop PhishOracle, a tool that generates adversarial phishing web pages by embedding diverse phishing features into legitimate web pages. We evaluate the robustness of three existing task-specific models—Stack model, VisualPhishNet, and Phishpedia—against PhishOracle-generated adversarial phishing web pages and observe a significant drop in their detection rates. In contrast, a Multimodal Large Language Model (MLLM)-based phishing detector demonstrates stronger robustness against these adversarial attacks but still is prone to evasion. Our findings highlight the vulnerability of phishing detection models to adversarial attacks, emphasizing the need for more robust detection approaches. Furthermore, we conduct a user study to evaluate whether PhishOracle-generated adversarial phishing web pages can deceive users. The results show that many of these phishing web pages evade not only existing detection models but also users.</p>