AI Safety

How to Moderate AI-Generated Text

Catch unsafe completions before users see them.

Pre-screen prompts and post-screen outputs in real time.

What it detects

• Hate speech in completions
• PII leakage
• Policy violations
• Jailbreak success indicators
• Hallucinated unsafe content
• Custom rules

Why developers choose Vettly

• Streaming SDK for real-time output screening
• Pre-built llm-output policy
• Same API works on inputs and outputs
• Audit trail for every blocked completion

Example request

bash

import { createStreamingClient } from '@vettly/sdk';

const streaming = createStreamingClient('YOUR_KEY');
const ws = streaming.connectRealtime({
  policyId: 'chat-policy',
  onResult: (result) => {
    if (result.safe) showMessage(result);
    else logBlocked(result);
  }
});

await ws.connect();
const result = await ws.moderate(message);

Example response

json

{
  "safe": true,
  "action": "allow",
  "categories": {
    "harassment": 0.02,
    "spam": 0.01
  },
  "latency_ms": 47
}

Compared to relying on model safety alone

Model-side guardrails are routinely evaded. An independent check after generation closes that gap.

Keep exploring

Vettly Family

Safety Inbox, Vettly Coach, and visible Companion pairing for teen families.

Content Moderation API

One endpoint for text, image, and video moderation.

Image Moderation API

Policy-driven image checks with clear allow, review, and block actions.

Content Moderation in Next.js

Add content moderation to a Next.js App Router project in minutes. Server-side API routes, React Server Components, and edge runtime examples.

Get an API key

Start making decisions in minutes with a Developer plan and clear upgrade paths.

Get an API key