AI Safety
How to Moderate AI-Generated Text
Catch unsafe completions before users see them.
Pre-screen prompts and post-screen outputs in real time.
What it detects
- • Hate speech in completions
- • PII leakage
- • Policy violations
- • Jailbreak success indicators
- • Hallucinated unsafe content
- • Custom rules
Why developers choose Vettly
- • Streaming SDK for real-time output screening
- • Pre-built llm-output policy
- • Same API works on inputs and outputs
- • Audit trail for every blocked completion
Example request
bashimport { createStreamingClient } from '@vettly/sdk';
const streaming = createStreamingClient('YOUR_KEY');
const ws = streaming.connectRealtime({
policyId: 'chat-policy',
onResult: (result) => {
if (result.safe) showMessage(result);
else logBlocked(result);
}
});
await ws.connect();
const result = await ws.moderate(message);Example response
json{
"safe": true,
"action": "allow",
"categories": {
"harassment": 0.02,
"spam": 0.01
},
"latency_ms": 47
}Compared to relying on model safety alone
Model-side guardrails are routinely evaded. An independent check after generation closes that gap.
Keep exploring
Vettly Family
Safety Inbox, Vettly Coach, and visible Companion pairing for teen families.
Content Moderation API
One endpoint for text, image, and video moderation.
Image Moderation API
Policy-driven image checks with clear allow, review, and block actions.
Content Moderation in Next.js
Add content moderation to a Next.js App Router project in minutes. Server-side API routes, React Server Components, and edge runtime examples.
Get an API key
Start making decisions in minutes with a Developer plan and clear upgrade paths.
Get an API key