From Guidelines to Practice: Validating Anti‑Phishing Strategies in Real‑World Organizations

From Guidelines to Practice: Validating Anti‑Phishing Strategies in Real‑World Organizations

We conducted a systematic study that validated a set of 41 anti‑phishing guidelines derived from a multi‑vocal literature review. The guidelines were designed to be actionable across a range of practitioner roles—from system designers and security staff to C‑suite executives—and to cover the full spectrum of intervention stages, including training, tooling, and incident response.

Ali Babar

Ali Babar

9/25/2025

1. Introduction

Phishing remains a pervasive challenge for organizations, yet many anti‑phishing interventions fail to deliver measurable improvements. We found that a core reason for these failures is that practitioners often overlook the needs of end users during the design, implementation, and evaluation phases of security programs.

To address this gap, we conducted a systematic study that validated a set of 41 anti‑phishing guidelines derived from a multi‑vocal literature review. The guidelines were designed to be actionable across a range of practitioner roles—from system designers and security staff to C‑suite executives—and to cover the full spectrum of intervention stages, including training, tooling, and incident response.

We recruited 18 security practitioners from 18 organizations spanning six countries (Australia, New Zealand, Sri Lanka, Bangladesh, Saudi Arabia, and Indonesia). These participants, representing a diversity of organizational sizes (from ten to over a thousand employees) and roles, provided semi‑structured feedback through Zoom interviews and a short Google Forms survey. The interviews revealed eight major challenges: training content design, lack of phishing datasets, limitations of anti‑phishing systems, the increasing complexity of attacks, balancing training frequency with fatigue, employee motivation, resource constraints, and post‑training assessment.

The practitioners largely confirmed the usefulness of the guidelines, praising especially the role‑based categorization and the rationale behind each recommendation. However, they also highlighted gaps related to personalization, real‑time incident response, and integration with existing security tools.

In response to these findings, we developed PhishGuide, a web‑based prototype that filters guidelines by practitioner group, intervention stage, type, and socio‑technical factor. Early user testing showed that the tool’s intuitive interface and context‑specific guidance were well received, suggesting that a user‑centric approach can bridge the theory‑practice divide in anti‑phishing work.

2. Research Context & Methodology

We investigated the practical utility of a set of 41 anti‑phishing guidelines that were derived from a multi‑vocal literature review. To test these guidelines in real‑world settings and identify additional challenges that security practitioners face, we conducted a qualitative study with security professionals from a variety of organizational sizes and geographic locations. Our participants included system designers, security staff, and C‑suite executives drawn from organizations ranging from 10 to 1,000+ employees across Australia, New Zealand, Sri Lanka, Bangladesh, Saudi Arabia, and Indonesia. A total of 18 practitioners completed in‑depth semi‑structured interviews after meeting inclusion criteria that ensured they had relevant experience in designing, implementing, or evaluating anti‑phishing interventions.

The recruitment process began with outreach to professional networks and industry groups, yielding 25 initial contacts. We screened these contacts for eligibility based on role, organizational size, and prior exposure to anti‑phishing tools or training programs. Twenty‑one respondents met the inclusion criteria, and we successfully scheduled interviews with 18 of them, achieving a 68 % completion rate.

Data collection followed a structured protocol that combined quantitative and qualitative components. Participants first completed a 15‑minute Google Form that gathered demographic information and Likert‑scale ratings on current anti‑phishing practices, perceived challenges, and the perceived usefulness of the 41 guidelines. The remainder of the session was a semi‑structured Zoom interview lasting approximately 45 minutes. The interview guide probed existing defensive measures (e.g., commercial solutions such as Mimecast or Azure versus ad‑hoc rule‑based approaches), training delivery methods, and the participants’ experiences with the prototype tool, PhishGuide. We also elicited feedback on the guidelines’ relevance, clarity, and integration potential with existing security platforms.

All interviews were audio‑recorded, transcribed verbatim, and imported into NVivo for coding. We applied an inductive coding scheme that evolved over three iterative rounds. The initial codebook captured themes such as “training content design,” “dataset availability,” “system limitations,” and “resource constraints.” Following the first round of coding, we refined the codebook by adding sub‑codes that reflected nuanced differences in organizational context. Two authors independently coded a subset of transcripts and then met to reconcile discrepancies, achieving an inter‑rater reliability of κ = 0.85. Quantitative statistics were generated to assess code saturation and the distribution of themes across participant demographics.

The methodological framework also included a rapid evaluation of the PhishGuide prototype. Participants interacted with a web‑based interface that allowed filtering of guidelines by practitioner role, intervention stage, type, and socio‑technical factor. We captured usability data through session recordings and post‑session open‑ended questions. The prototype was iteratively refined based on user feedback, with particular emphasis on integrating the guidelines into existing security workflows and ensuring the interface remained uncluttered.

Overall, our mixed‑method approach combined demographic and Likert‑scale data, rich qualitative insights, and usability testing to provide a comprehensive assessment of the guidelines’ applicability in diverse organizational contexts. The findings confirm that practitioners largely validate the guidelines but also highlight gaps in personalization, resource allocation, and real‑time incident response, thereby informing future refinements of both the guidelines and the PhishGuide tool.

3. Key Findings

Our study examined how practitioners across a range of organizations design, implement, and evaluate anti‑phishing interventions. Interviews with 18 security professionals revealed that contemporary programs typically combine commercial, automated tools with ad‑hoc manual rule sets. In larger enterprises, solutions such as Mimecast or Azure are deployed, whereas smaller firms rely on custom email‑header inspections or rule‑based filtering. Training initiatives are often bundled with other security programs or delivered in brief, infrequent sessions, limiting their reach and impact.

Through the semi‑structured interviews we identified eight recurring challenges that surfaced across organizational contexts:

  1. Training content design – Practitioners reported a lack of up‑to‑date, actionable material and struggled to keep content current as attack tactics evolve.
  2. Phishing dataset scarcity – Participants highlighted limited access to high‑quality, diverse datasets, especially for emerging vectors such as smishing, which hampers both training efficacy and system evaluation.
  3. Limitations of anti‑phishing systems – Commercial platforms frequently generate false positives and exhibit bias toward well‑known phishing trends, leaving sophisticated or novel attacks unchecked.
  4. Complexity of phishing attacks – Legitimate domain spoofing and automated response mechanisms increase detection difficulty.
  5. Training frequency versus fatigue – Organizations struggle to balance the need for regular reminders against employee disengagement.
  6. Employee motivation – Weak password practices and personal device usage undermine training outcomes.
  7. Resource constraints – Many organizations face staffing, budget, and expertise shortages, limiting their ability to respond to phishing incidents or conduct thorough post‑training assessments.
  8. Post‑training assessment – Current evaluation methods suffer from ceiling effects, bot interference, and a lack of longitudinal data.

These challenges align closely with the 41 guidelines we derived from a multi‑vocal literature review, confirming ecological validity. However, interviewees consistently highlighted that the guidelines, while useful, require contextual adaptation, richer illustrative examples, and integration with existing tools to be actionable within specific organizational constraints.

Participants also articulated a set of desired features for a supportive tool: the ability to filter guidance by practitioner group, intervention stage, type, and socio‑technical factor; a view of all guidelines or a personalized subset; links to relevant resources and documentation; automatic updates as new guidelines emerge; and a user‑friendly interface that integrates with existing security platforms. These requests informed the design of our prototype, PhishGuide, which implements the aforementioned features. An early version of the prototype allowed up to two personalized guidelines, but the final version expanded to full browsing and filtering. Participants praised the intuitive layout and the ability to tailor guidance to their organization’s specific environment.

In sum, our findings underscore that effective anti‑phishing interventions must be user‑centric, contextualized, and resource‑aware. Addressing the identified challenges—particularly those related to training content, dataset availability, system limitations, and post‑training evaluation—will bridge the gap between theoretical guidelines and practical deployment in real‑world organizations.

4. PhishGuide Prototype

We set out to create a practical, web‑based tool that brings the 41 anti‑phishing guidelines to the practitioners who need them. The prototype was built to satisfy four core functional goals identified in the interviews: discoverability, filterability, contextual relevance, and maintainability.

Discoverability and Filterability

To meet the discoverability goal, the prototype exposes four drop‑down selectors that map directly to the guideline metadata: practitioner group, intervention stage, type, and socio‑technical factor. When a selector is changed, the system immediately queries the guideline repository and returns only those guidelines that satisfy all selected criteria. This dynamic filtering reduces a large catalogue to a manageable set in real time, allowing users to focus on the most relevant guidance for their role and situation.

In addition to filtered views, the prototype offers a full‑catalogue mode. Users can search the entire list with a keyword box that updates the display instantly as they type. The dual approach of a filtered subset and a complete searchable list caters to both casual users who need quick recommendations and power users who wish to explore all available guidelines.

Personalization and Contextualization

Recognizing that each organization faces distinct challenges, the prototype allows users to create a simple profile that records their organizational role and the specific difficulties they face. The system stores the user’s filter preferences and automatically applies them on subsequent visits. Users can also bookmark guidelines that they find especially useful; these bookmarks appear in a dedicated panel for rapid access. This personalization layer aligns the guideline set with the user’s operational context.

The prototype integrates links to relevant documentation and external resources. For example, when a guideline references a configuration in Mimecast or Azure, the system provides a direct hyperlink to the vendor’s knowledge base. These links reduce the friction that practitioners experience when trying to implement the guidelines within their existing security stack.

Maintainability and Updates

The guideline set must remain current. To this end, the prototype is connected to an update service that pulls new guidelines as they are published. When an update is detected, the backend refreshes its database and the front end notifies users of available additions. This automatic update mechanism ensures that the tool remains a living resource rather than a static reference.

User Interface Design

The interface follows a clear, responsive layout: the filter panel sits on the left, the guideline list on the right, and a details pane expands when a guideline is selected. The design is responsive, enabling use on both desktop and mobile devices. Visual cues such as icons and colour coding indicate guideline categories and urgency levels. Users can search by keyword, and the list updates instantly as they type.

Evaluation and Feedback

During the evaluation phase, we first deployed a minimal version built on Google Forms that limited users to two personalized guidelines. Feedback from early adopters highlighted the need for broader browsing capabilities and more granular filtering. After iterating on the interface and expanding the feature set to full browsing and filtering, participants praised the intuitive layout and the ability to tailor guidance to their organization’s context. The prototype’s design decisions directly addressed the gaps identified in the earlier stages of the study, such as the lack of contextual adaptation and integration with existing tools.

In sum, the PhishGuide prototype translates the research insights into a usable, technically robust tool. By combining flexible filtering, personalized subsets, resource integration, and automatic updates, it empowers practitioners to apply evidence‑based guidelines in the real‑world environments where phishing threats persist.

5. Take‑aways & Future Directions

We collected detailed insights from 18 security practitioners in six countries, covering design, implementation, and evaluation of anti‑phishing interventions. The 41 guidelines that emerged from our multi‑vocal review were largely validated, yet several technical gaps persisted. These gaps highlight the need for a more context‑aware, data‑driven approach to guideline deployment and tool support.

Take‑aways

We found that user‑centric design remains the most critical enabler for effective interventions. Participants across all roles—designers, security staff, and C‑suite executives—reported that guidelines must be embedded in the users’ workflow rather than imposed as an add‑on. The role‑based categorization of the 41 guidelines was useful, but practitioners demanded concrete, scenario‑driven examples and tighter integration with existing security platforms such as Mimecast, Azure, and custom incident‑response dashboards. When guidelines were abstract, respondents cited a steep learning curve that reduced uptake.

We also observed that training content design and delivery are constrained by organizational resources. Smaller firms (10‑200 employees) cited limited budgets for security tooling and personnel, which forced them to bundle phishing training with other awareness programs and schedule sessions infrequently. This approach created a tension between maintaining continuous awareness and avoiding fatigue, as reflected in the participants’ comments on training frequency. The data also revealed that many organizations lacked up‑to‑date phishing datasets, with 56 % of respondents reporting a scarcity of smishing and spear‑phishing samples. This scarcity limited the ability to generate realistic training scenarios and to evaluate new detection models.

The prototype PhishGuide addressed several of these practitioner needs. We designed a web‑based interface that allows filtering by practitioner group, intervention stage, guideline type, and socio‑technical factor. During the pilot, participants praised the ability to retrieve a personalized subset of guidelines, which reduced cognitive overload. They also highlighted the value of automatic updates—triggered by new guideline releases—and hyperlinks to relevant documentation. These features underscored the importance of a continuously evolving knowledge base that can adapt to the dynamic threat landscape.

We noted that existing commercial anti‑phishing solutions are not a silver bullet. Participants reported high false‑positive rates and biases toward prevalent phishing trends, limiting the effectiveness of automated filtering. The guidelines’ emphasis on manual oversight and edge‑case analysis remains vital, especially for organizations that rely heavily on commercial tools. The study also revealed that post‑training assessment is complicated by bot interference and ceiling effects. Automated responses can inflate retention metrics, making it difficult to gauge true user understanding. This finding suggests that future evaluation frameworks must differentiate between human and automated interactions.

Future Directions

We recommend scaling the evaluation to larger, longitudinal cohorts. While our sample was geographically diverse, extending the study across a broader range of organizations and over longer periods will help quantify the long‑term impact of guideline adoption on phishing success rates. Additionally, we plan to enhance PhishGuide with an NLP‑driven recommendation engine that maps organizational metadata—such as role distribution, incident logs, and historical phishing incidents—to specific guidelines. This automation will reduce the manual effort required by practitioners and enable real‑time, context‑aware guidance.

Integrating real‑time incident‑response support into the tool will address the identified gap in dynamic threat handling. By providing actionable steps during an active phishing event, PhishGuide can move beyond static guidance to operational support. We also aim to develop richer evaluation metrics that filter out bot interference, such as anomaly detection in response patterns or human‑verification checkpoints, to ensure post‑training assessments truly reflect user learning.

Finally, we propose investigating cross‑domain transferability of the guidelines to other cybersecurity domains, such as social engineering and insider threats. Adapting the guidelines to these areas could uncover common design principles and broaden the impact of our research. Pursuing these directions will transition the 41 guidelines from validated best practices to a fully integrated, context‑aware anti‑phishing ecosystem that continuously learns from practitioner feedback and evolving threat landscapes.