Enforcement & Policies

Configure what happens when Rivaro detects a violation — from observation mode (detect and log) to full enforcement (block, redact, quarantine).

Observation Mode vs Enforcement Mode

By default, with no policies configured, Rivaro runs in observation mode:

All traffic passes through to the AI provider and back
Detections are logged (PII found, prompt injection detected, etc.)
Nothing is blocked or modified
Results appear in the dashboard

This is useful for understanding what your AI traffic looks like before deciding what to enforce.

Enforcement mode activates when you configure policy rules. Rules map detections to actions: "when you find PII_SSN in egress traffic, block it."

Policy Actions

When a detection matches a policy rule, Rivaro applies one of these actions:

Action	What happens	Developer sees
ALLOW	Traffic passes through unchanged	Normal response
LOG	Traffic passes through, violation is recorded	Normal response (violation visible in dashboard)
REDACT	Sensitive content is masked before forwarding	Response with `[REDACTED]` replacing sensitive text
BLOCK	Request is rejected, AI provider is never called	Error response with `finish_reason: "content_filter"`
REDACT_AND_ALERT	Content redacted and notification sent	Redacted response + admin notification
QUARANTINE	Actor is quarantined, all subsequent requests blocked	403 on this and future requests until admin review
STEP_UP	Request held pending human approval	Request paused until approved
REMEDIATE	Automated remediation applied	Varies by remediation action
AUTO_REMEDIATE	System automatically applies fix	Varies by remediation action

What blocking looks like to developers

When a request is blocked, the response format matches the AI provider's format so SDKs handle it gracefully:

OpenAI / Azure:

{
  "choices": [{
    "message": {"role": "assistant", "content": "Content blocked due to policy violations"},
    "finish_reason": "content_filter"
  }]
}

Anthropic / Bedrock (Claude):

{
  "content": [{"type": "text", "text": "Content blocked due to policy violations"}],
  "stop_reason": "content_filtered"
}

Streaming:

data: {"blocked":true,"message":"Content blocked due to policy violations"}

Developers can check for finish_reason: "content_filter" (OpenAI) or stop_reason: "content_filtered" (Anthropic) to detect enforcement blocks programmatically.

What redaction looks like

When content is redacted, the sensitive text is replaced with a mask before the request is forwarded to the AI provider (ingress) or before the response is returned to the developer (egress). The original content is preserved in the detection record for audit.

Policy Rules

A policy rule maps a detection condition to an enforcement action.

Rule structure

Field	Description
Detection type	Specific detection to match (e.g. `PII_SSN`, `SECURITY_PROMPT_INJECTION`)
Risk category	Broader match — applies to ALL detection types in the category
Action	What to do when matched (BLOCK, REDACT, LOG, etc.)
Lifecycle	When to apply: INGRESS, EGRESS, DEPLOYMENT, TRAINING
Enabled	Toggle rule on/off without deleting

Rule matching hierarchy

When a detection occurs, Rivaro resolves which policy rule applies using this priority order (most specific wins):

Detection-type-level custom rule — A rule targeting a specific detection type (e.g. "BLOCK PII_SSN"). If one exists for this detection type, it wins.
Risk-category-level custom rule — A rule targeting the detection's risk category (e.g. "REDACT all EXTERNAL_DATA_EXFILTRATION"). Applied when no detection-type-level rule exists.
Template default — If the AppContext uses a policy template (e.g. healthcare, financial services), the template's default action for this risk category applies.
Fallback — If nothing else matches, the action is LOG (observe and record, don't enforce).

Rule scoping

Rules can be scoped to different levels:

Scope	Description
AppContext-specific	Applies only to traffic through one AppContext
Organization-wide	Applies to all traffic across the organization

AppContext-specific rules take priority over organization-wide rules.

Enforcement Pipeline

When a request flows through the proxy, enforcement happens in phases:

Ingress (before calling the AI provider)

Anomaly detection — rate limits, actor status checks
Content analysis — all enabled detectors scan the input
Policy evaluation — each detection is matched against policy rules
Decision — ALLOW, LOG, REDACT, or BLOCK

If the decision is BLOCK, the AI provider is never called. The developer gets a block response immediately.

If the decision is REDACT, sensitive content is masked in the request before it's forwarded to the AI provider.

Egress (after the AI provider responds)

Content analysis — detectors scan the response
Policy evaluation — detections matched against rules
Decision — LOG, REDACT, or flag for governance action

For streaming responses, egress detection runs on the accumulated full response after the stream completes.

Agent Governance

Beyond per-request policy enforcement, Rivaro tracks actor behavior over time and can automatically escalate responses for repeat offenders.

Trust scores

Every actor (agent, user, API key) has a trust score (0–100). The score decreases as violations accumulate and recovers over time.

Factor	Impact
Detection severity	LOW: 10, MEDIUM: 30, HIGH: 60, CRITICAL: 100
Violation count	More violations = higher risk (capped)
Recency	Recent violations weighted more heavily
Session context	Accessing credentials or sensitive data increases risk

Automatic escalation

Based on risk level, Rivaro can automatically escalate:

Risk Level	Trigger	Action
MINIMAL	Low risk score, high trust	Normal operation
ELEVATED	Moderate violations, trust declining	WARN — violation logged with elevated visibility
HIGH	Significant violations, low trust	RATE_LIMIT — actor throttled to 10–20 req/min
CRITICAL	Severe violations or very low trust	QUARANTINE — all requests blocked until admin review
CRITICAL + repeat	Critical risk with violations above termination threshold	TERMINATE — actor permanently blocked

Quarantine

When an actor is quarantined:

All proxy requests from that actor are immediately blocked (403)
The actor appears in the dashboard's quarantine queue
An administrator must review and either release or terminate the actor
On release, violation counts reset and the trust score begins recovering

Termination

When an actor is terminated:

All proxy requests are permanently blocked
The actor cannot be reactivated through normal flows
This is reserved for severe, repeated policy violations

Admin controls

Automatic quarantine and termination can be enabled or disabled per organization. When disabled, Rivaro still calculates risk levels and trust scores but only warns — it doesn't take automatic action.

Next steps

Understanding Detections — What Rivaro detects and how it's classified
Configuration Guide — Set up AppContexts and detection keys
Error Handling — How enforcement appears to developers

Observation Mode vs Enforcement Mode​

Policy Actions​

What blocking looks like to developers​

What redaction looks like​

Policy Rules​

Rule structure​

Rule matching hierarchy​

Rule scoping​

Enforcement Pipeline​

Ingress (before calling the AI provider)​

Egress (after the AI provider responds)​

Agent Governance​

Trust scores​

Automatic escalation​

Quarantine​

Termination​

Admin controls​

Next steps​