Phishing Detection with Evidence
A tool-using RL environment for training and evaluating models on phishing detection with evidence-based reasoning. Models analyze emails for phishing indicators, search for corroborating evidence, and provide justified classifications.
Overview
This environment implements evidence-seeking phishing detection, combining content analysis with external validation tools to identify and explain phishing attempts.
Environment Type: ToolEnv - Multi-turn environment with tool access
Task: Classify emails as phishing/legitimate with evidence-based justification
Tools: URL reputation checker, domain WHOIS lookup, content similarity search
Reward Structure: Classification accuracy + evidence quality + explanation coherence
Installation
Install the environment using the Prime CLI:
prime env install intertwine/sv-env-phishing-detection
Or using pip directly:
pip install sv-env-phishing-detection
Setup
API Keys Configuration
Set your API keys as environment variables:
# OpenAI API Key (required for OpenAI models)
export OPENAI_API_KEY="your-openai-api-key"
# For persistent configuration
echo 'export OPENAI_API_KEY="your-key"' >> ~/.bashrc
source ~/.bashrc
Usage
With Verifiers Library
import verifiers as vf
# Load the environment with tools enabled
env = vf.load_environment("intertwine/sv-env-phishing-detection", include_tools=True)
# Evaluate a model
results = env.evaluate(
client=vf.OpenAIClient(),
model="gpt-5-mini",
num_examples=10
)
print(f"Average reward: {results.stats['mean_reward']:.2%}")
print(f"Detection accuracy: {results.stats.get('accuracy', 0):.2%}")
Quick Evaluation
Use the verifiers CLI:
# Basic evaluation with tools
vf-eval intertwine/sv-env-phishing-detection \
--model gpt-5-mini \
--num-examples 10
# Without tools (content analysis only)
vf-eval intertwine/sv-env-phishing-detection \
--model gpt-5-mini \
--num-examples 10 \
--include-tools false
Training with Prime RL
[environment]
id = "intertwine/sv-env-phishing-detection"
kwargs = {include_tools = true}
Task Details
Input Format
Email content with headers and body:
From: security@amaz0n-support.com
Subject: Urgent: Account Security Alert
Body: Your Amazon account has been compromised. Click here to secure it: http://bit.ly/secure-amz
Expected Output
JSON object with classification and evidence:
{
"label": "Phishing",
"confidence": 0.95,
"evidence": [
"Spoofed sender domain (amaz0n-support.com vs amazon.com)",
"Suspicious URL shortener (bit.ly)",
"Urgency tactics in subject line"
],
"explanation": "Email exhibits multiple phishing indicators including domain spoofing and social engineering tactics"
}
Available Tools
When include_tools=True, the model has access to:
- check_url_reputation: Analyze URL safety and reputation
- lookup_domain_whois: Get domain registration details
- search_similar_campaigns: Find similar phishing patterns
- verify_sender_authenticity: Check SPF/DKIM records
Scoring
The environment uses a weighted rubric:
- Classification Accuracy (50%): Correct phishing/legitimate determination
- Evidence Quality (30%): Relevant and verifiable indicators
- Explanation Coherence (10%): Clear reasoning from evidence
- Tool Utilization (10%): Effective use of verification tools
Weights & Biases Logging
This environment supports automatic Weave tracing:
import weave
import verifiers as vf
# Initialize Weave
weave.init(project="phishing-detection")
# Load and evaluate
env = vf.load_environment("intertwine/sv-env-phishing-detection", include_tools=True)
results = env.evaluate(
client=vf.OpenAIClient(),
model="gpt-5-mini",
num_examples=50
)
# Results automatically traced to W&B
Configure via environment variables:
WEAVE_PROJECT: Set project nameWEAVE_DISABLED: Set to 'true' to disable loggingWANDB_API_KEY: Your W&B API key
Evaluation Approach
Metrics Tracked
- Detection Accuracy: Phishing vs legitimate classification
- False Positive Rate: Legitimate emails marked as phishing
- False Negative Rate: Phishing emails missed (critical metric)
- Evidence Precision: Validity of cited indicators
- Response Time: Tool usage efficiency
Example Evaluation Script
import verifiers as vf
import weave
weave.init(project="phishing-eval")
env = vf.load_environment("intertwine/sv-env-phishing-detection", include_tools=True)
# Evaluate with focus on reducing false negatives
results = env.evaluate(
client=vf.OpenAIClient(),
model="gpt-5-mini",
num_examples=200,
seed=42
)
print(f"Mean Reward: {results.stats['mean_reward']:.2%}")
print(f"Accuracy: {results.stats.get('accuracy', 0):.2%}")
print(f"False Positives: {results.stats.get('false_positive_rate', 0):.2%}")
print(f"False Negatives: {results.stats.get('false_negative_rate', 0):.2%}")
Performance Benchmarks
| Model | Accuracy | False Positives | False Negatives | Overall |
|---|---|---|---|---|
| GPT-4o-mini | 87% | 8% | 5% | 82% |
| GPT-4o | 93% | 4% | 3% | 89% |
Phishing Tactics Covered
The environment includes diverse phishing techniques:
- Domain Spoofing: Lookalike domains, homoglyphs
- URL Obfuscation: Shorteners, redirects, embedded links
- Social Engineering: Urgency, authority, scarcity tactics
- Credential Harvesting: Fake login pages, form requests
- Attachment Threats: Malicious documents, executables
- Business Email Compromise: CEO fraud, invoice scams
Dataset
- Phishing Samples: Real-world inspired phishing emails
- Legitimate Emails: Business, personal, and marketing emails
- Evidence Database: Known phishing domains, campaigns
- Validation Data: SPF/DKIM records, WHOIS information
Future Improvements
- Attachment Analysis: Scan documents and executables for threats
- Multi-language Support: Detect phishing in non-English emails
- Real-time Threat Intelligence: Integration with threat feeds
- User Context: Personalized detection based on user patterns
- Campaign Tracking: Link related phishing attempts
- Automated Response: Generate warning messages and remediation steps
Requirements
- Python 3.12+
verifiers>=0.1.4- API key for model inference
About
This environment is part of the Open Security Verifiers suite - a collection of security and alignment RL environments using Prime Intellect's Verifiers framework. Each environment provides executable, programmatic rewards for training robust security-aware AI systems.
Support
For issues or questions:
- Report issues on the Prime Intellect Environments Hub
- Check the Security Verifiers GitHub repository
- Contact the Intertwine team