TL;DR
This guide demonstrates integrating LLMs (Claude 3.5 Sonnet, GPT-4) with Prometheus to transform raw metrics into intelligent, context-aware alerts. Instead of static threshold alerts, you’ll use AI to analyze metric patterns, correlate events across services, and generate actionable incident summaries with root cause analysis.
Core workflow: Prometheus AlertManager webhook sends to Python middleware, which calls the LLM API, producing an enriched alert forwarded to PagerDuty/Slack. The LLM receives time-series data, recent logs, and infrastructure context to produce alerts like “CPU spike correlates with database connection pool exhaustion; recommend increasing max_connections from 100 to 200” instead of generic “CPU > 80%”.
Key integration points:
- Prometheus recording rules pre-aggregate complex queries for LLM context
- Python FastAPI service bridges AlertManager webhooks to OpenAI/Anthropic APIs
- Prompt engineering with system context (runbooks, topology maps, historical incidents)
- Structured output using JSON mode for parseable remediation steps
Real-world example: A disk space alert includes the LLM analyzing node_filesystem_avail_bytes, recent Docker image pulls, and log growth rates to identify the actual culprit (old kernel images vs. application logs).
curl -X POST http://llm-alert-bridge:8080/enrich \
-H "Content-Type: application/json" \
-d @prometheus_alert.json
Critical warnings:
- Always validate AI-generated commands in staging before production execution
- LLMs may hallucinate metric names or threshold values – verify against actual Prometheus data
- Set API rate limits to prevent alert storms from exhausting LLM quotas
- Never auto-execute remediation commands; require human approval for destructive operations
Cost consideration: Expect $0.03-0.15 per enriched alert with GPT-4 Turbo (varies by context size). Budget accordingly for high-traffic environments.
Core Steps
Install Prometheus and node_exporter on your monitoring server. Configure /etc/prometheus/prometheus.yml with scrape targets for your infrastructure:
scrape_configs:
- job_name: 'linux_servers'
static_configs:
- targets: ['web01:9100', 'db01:9100', 'cache01:9100']
Configure Alert Rules with Context
Create /etc/prometheus/rules/llm_enhanced.yml with alerts that include rich context for LLM processing:
groups:
- name: system_alerts
rules:
- alert: HighMemoryUsage
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.1
annotations:
summary: "Memory critical on {{ $labels.instance }}"
context: "Available: {{ $value | humanizePercentage }}, Total: {{ with query \"node_memory_MemTotal_bytes{instance='\" }}{{ . | first | value | humanize1024 }}{{ end }}"
Build LLM Alert Handler
Create /opt/prometheus-llm/alert_handler.py to receive Alertmanager webhooks and query Claude/GPT-4 for analysis:
import anthropic
import json
from flask import Flask, request
app = Flask(__name__)
client = anthropic.Anthropic(api_key="your-api-key")
@app.route('/webhook', methods=['POST'])
def handle_alert():
alert = request.json['alerts'][0]
prompt = f"""Analyze this Prometheus alert and provide:
1. Root cause analysis
2. Immediate remediation steps
3. Prevention recommendations
Alert: {alert['labels']['alertname']}
Instance: {alert['labels']['instance']}
Context: {alert['annotations']['context']}"""
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
return json.dumps({"analysis": response.content[0].text})
Critical: Always review AI-suggested commands before execution. Use --dry-run flags where available. Never pipe LLM output directly to bash or sh without human validation – hallucinated commands can cause outages or data loss.
Configure Alertmanager Integration
Update /etc/alertmanager/alertmanager.yml:
receivers:
- name: 'llm-analyzer'
webhook_configs:
- url: 'http://localhost:5000/webhook'
Implementation
Deploy Prometheus with node_exporter and alertmanager using your existing configuration management. Install the OpenAI Python SDK or Anthropic’s Claude SDK on your monitoring server:
pip3 install openai anthropic prometheus-client pyyaml
LLM Alert Enrichment Service
Create /opt/prometheus-llm/alert_processor.py to intercept Alertmanager webhooks:
import anthropic
from flask import Flask, request
import yaml
app = Flask(__name__)
client = anthropic.Anthropic(api_key="your-api-key")
@app.route('/webhook', methods=['POST'])
def process_alert():
alert = request.json['alerts'][0]
prompt = f"""Analyze this Prometheus alert and provide:
1. Root cause analysis
2. Immediate remediation steps
3. Relevant systemctl/journalctl commands
Alert: {alert['labels']['alertname']}
Instance: {alert['labels']['instance']}
Description: {alert['annotations']['description']}
Metrics: {alert['annotations'].get('metrics', 'N/A')}"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
# Send enriched alert to Slack/PagerDuty
return response.content[0].text
CRITICAL: Never execute AI-suggested commands automatically. The LLM may hallucinate package names, incorrect flags, or destructive operations. Always review suggestions with a human operator before execution.
Alertmanager Configuration
Configure /etc/alertmanager/alertmanager.yml to route through your LLM service:
receivers:
- name: 'llm-enriched'
webhook_configs:
- url: 'http://localhost:5000/webhook'
send_resolved: true
route:
receiver: 'llm-enriched'
group_wait: 30s
group_interval: 5m
Validation Layer
Implement command validation using a whitelist approach before presenting AI suggestions to operators. Store approved command patterns in /etc/prometheus-llm/safe_commands.yaml and filter LLM output against this list.
Verification and Testing
Create test alerts to validate your LLM pipeline without waiting for real incidents:
# Generate test alert via Prometheus Alertmanager API
curl -XPOST http://localhost:9093/api/v1/alerts -d '[{
"labels": {
"alertname": "HighMemoryUsage",
"instance": "web-prod-03:9100",
"severity": "warning"
},
"annotations": {
"summary": "Memory usage at 87% on web-prod-03"
}
}]'
Monitor your LLM processing pipeline to confirm alert enrichment occurs within acceptable latency (typically <5 seconds).
Validating LLM Output Quality
Never execute AI-generated remediation commands without human review. Implement a validation layer:
import anthropic
import subprocess
def validate_command(cmd: str) -> bool:
"""Validate AI-suggested command before execution"""
dangerous_patterns = ['rm -rf /', 'dd if=', 'mkfs', '> /dev/sd']
if any(pattern in cmd for pattern in dangerous_patterns):
return False
# Use LLM to explain command impact
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=300,
messages=[{
"role": "user",
"content": f"Explain risks of this command on production: {cmd}"
}]
)
print(f"AI Safety Analysis:\n{response.content[0].text}")
return input("Execute? (yes/no): ").lower() == "yes"
Critical Warning: AI models can hallucinate plausible-looking but incorrect commands. Always validate against your runbooks and test in staging environments first.
End-to-End Testing
Verify the complete workflow with integration tests:
# Test alert > LLM > Slack notification chain
./scripts/test_alert_pipeline.sh --alert-type cpu_spike \
--validate-llm-response \
--check-slack-delivery \
--timeout 30s
Monitor LLM API costs during testing. Claude API calls average $0.015 per alert analysis. Set budget alerts in your cloud provider console to prevent unexpected charges during high-volume incident periods.
Best Practices
Implement strict rate limits on LLM API calls to prevent runaway costs during alert storms. Set per-minute quotas in your alerting pipeline:
from ratelimit import limits, sleep_and_retry
@sleep_and_retry
@limits(calls=10, period=60)
def query_llm_for_alert(alert_data):
return anthropic_client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{"role": "user", "content": alert_data}]
)
Cache LLM responses for identical alert patterns to significantly reduce API calls in typical production environments.
Validation Gates for AI-Generated Commands
Never execute AI-suggested remediation commands without human approval. Structure your workflow with explicit validation checkpoints:
# alertmanager-llm-config.yaml
receivers:
- name: 'llm-analyzer'
webhook_configs:
- url: 'http://llm-bridge:8080/analyze'
send_resolved: true
http_config:
validation_mode: 'suggest_only' # Never auto-execute
AI models hallucinate – especially with system commands. Always display suggested commands in Slack/PagerDuty for engineer review before execution. Use dry-run flags (--check, --dry-run) when testing AI-generated Ansible playbooks or kubectl commands.
Prompt Engineering for Consistency
Maintain versioned prompt templates in Git with explicit constraints:
ALERT_ANALYSIS_PROMPT = """
Analyze this Prometheus alert. Provide:
1. Root cause (max 2 sentences)
2. Impact assessment (critical/high/medium/low)
3. Suggested investigation commands (read-only only)
Alert: {alert_json}
Constraints: No destructive commands. No assumptions about custom applications.
"""
Fallback Mechanisms
Configure traditional alerting as backup when LLM APIs are unavailable:
route:
routes:
- matchers:
- severity="critical"
receiver: 'pagerduty-direct' # Skip LLM for critical alerts
continue: true
Test failover quarterly to ensure reliability during LLM service outages.
FAQ
No. Always validate AI-generated commands before execution. Use a human-in-the-loop workflow where the LLM suggests remediation steps, but requires approval. Implement this with Alertmanager webhooks that send suggestions to a review queue (Slack, PagerDuty) rather than executing directly:
# Safe pattern: suggest, don't execute
def handle_alert(alert_data):
suggestion = llm.generate_remediation(alert_data)
slack.post_message(
channel="#ops-review",
text=f"Suggested fix:\n```bash\n{suggestion}\n```\nApprove: /execute-{alert_id}"
)
How do I prevent hallucinated metrics or commands?
Constrain LLM outputs with structured prompts and validation. For Prometheus queries, provide the actual metric names in your prompt context:
prompt = f"""Available metrics: {prometheus.list_metrics()}
Generate a PromQL query for CPU usage above 80%.
Output ONLY valid PromQL, no explanation."""
Validate generated PromQL with promtool check rules before loading into Prometheus.
What’s the token cost for continuous monitoring?
Expect 500-2K tokens per alert analysis. At $0.25/1M tokens (Claude 3.5 Haiku pricing, 2026), analyzing 1000 alerts costs $0.13-$0.50. Use caching for repeated context (metric definitions, runbooks) to substantially reduce costs. Monitor your LLM API usage with:
# Track daily token usage
curl -H "x-api-key: $ANTHROPIC_KEY" \
https://api.anthropic.com/v1/usage/daily | jq '.data[] | {date, tokens}'
Can I run LLMs locally for sensitive infrastructure?
Yes. Deploy Ollama with llama3.1:70b or mixtral:8x22b on a dedicated server (minimum 48GB RAM). Latency increases to 5-15 seconds per query versus 1-2 seconds for API calls, but keeps alert data internal. Use vLLM for better throughput if processing multiple alerts concurrently.
