Skip to main content

Enforcement & Policies

Configure what happens when Rivaro detects a violation — from observation mode (detect and log) to full enforcement (block, redact, quarantine).

Observation Mode vs Enforcement Mode

By default, with no policies configured, Rivaro runs in observation mode:

  • All traffic passes through to the AI provider and back
  • Detections are logged (PII found, prompt injection detected, etc.)
  • Nothing is blocked or modified
  • Results appear in the dashboard

This is useful for understanding what your AI traffic looks like before deciding what to enforce.

Enforcement mode activates when you configure policy rules. Rules map detections to actions: "when you find PII_SSN in egress traffic, block it."

Policy Actions

When a detection matches a policy rule, Rivaro applies one of these actions:

ActionWhat happensDeveloper sees
ALLOWTraffic passes through unchangedNormal response
LOGTraffic passes through, violation is recordedNormal response (violation visible in dashboard)
REDACTSensitive content is masked before forwardingResponse with [REDACTED] replacing sensitive text
BLOCKRequest is rejected, AI provider is never calledError response with finish_reason: "content_filter"
REDACT_AND_ALERTContent redacted and notification sentRedacted response + admin notification
QUARANTINEActor is quarantined, all subsequent requests blocked403 on this and future requests until admin review
STEP_UPRequest held pending human approvalRequest paused until approved
REMEDIATEAutomated remediation appliedVaries by remediation action
AUTO_REMEDIATESystem automatically applies fixVaries by remediation action

What blocking looks like to developers

When a request is blocked, the response format matches the AI provider's format so SDKs handle it gracefully:

OpenAI / Azure:

{
"choices": [{
"message": {"role": "assistant", "content": "Content blocked due to policy violations"},
"finish_reason": "content_filter"
}]
}

Anthropic / Bedrock (Claude):

{
"content": [{"type": "text", "text": "Content blocked due to policy violations"}],
"stop_reason": "content_filtered"
}

Streaming:

data: {"blocked":true,"message":"Content blocked due to policy violations"}

Developers can check for finish_reason: "content_filter" (OpenAI) or stop_reason: "content_filtered" (Anthropic) to detect enforcement blocks programmatically.

What redaction looks like

When content is redacted, the sensitive text is replaced with a mask before the request is forwarded to the AI provider (ingress) or before the response is returned to the developer (egress). The original content is preserved in the detection record for audit.

Policy Rules

A policy rule maps a detection condition to an enforcement action.

Rule structure

FieldDescription
Detection typeSpecific detection to match (e.g. PII_SSN, SECURITY_PROMPT_INJECTION)
Risk categoryBroader match — applies to ALL detection types in the category
ActionWhat to do when matched (BLOCK, REDACT, LOG, etc.)
LifecycleWhen to apply: INGRESS, EGRESS, DEPLOYMENT, TRAINING
EnabledToggle rule on/off without deleting

Rule matching hierarchy

When a detection occurs, Rivaro resolves which policy rule applies using this priority order (most specific wins):

  1. Detection-type-level custom rule — A rule targeting a specific detection type (e.g. "BLOCK PII_SSN"). If one exists for this detection type, it wins.
  2. Risk-category-level custom rule — A rule targeting the detection's risk category (e.g. "REDACT all EXTERNAL_DATA_EXFILTRATION"). Applied when no detection-type-level rule exists.
  3. Template default — If the AppContext uses a policy template (e.g. healthcare, financial services), the template's default action for this risk category applies.
  4. Fallback — If nothing else matches, the action is LOG (observe and record, don't enforce).

Rule scoping

Rules can be scoped to different levels:

ScopeDescription
AppContext-specificApplies only to traffic through one AppContext
Organization-wideApplies to all traffic across the organization

AppContext-specific rules take priority over organization-wide rules.

Enforcement Pipeline

When a request flows through the proxy, enforcement happens in phases:

Ingress (before calling the AI provider)

  1. Anomaly detection — rate limits, actor status checks
  2. Content analysis — all enabled detectors scan the input
  3. Policy evaluation — each detection is matched against policy rules
  4. Decision — ALLOW, LOG, REDACT, or BLOCK

If the decision is BLOCK, the AI provider is never called. The developer gets a block response immediately.

If the decision is REDACT, sensitive content is masked in the request before it's forwarded to the AI provider.

Egress (after the AI provider responds)

  1. Content analysis — detectors scan the response
  2. Policy evaluation — detections matched against rules
  3. Decision — LOG, REDACT, or flag for governance action

For streaming responses, egress detection runs on the accumulated full response after the stream completes.

Agent Governance

Beyond per-request policy enforcement, Rivaro tracks actor behavior over time and can automatically escalate responses for repeat offenders.

Trust scores

Every actor (agent, user, API key) has a trust score (0–100). The score decreases as violations accumulate and recovers over time.

FactorImpact
Detection severityLOW: 10, MEDIUM: 30, HIGH: 60, CRITICAL: 100
Violation countMore violations = higher risk (capped)
RecencyRecent violations weighted more heavily
Session contextAccessing credentials or sensitive data increases risk

Automatic escalation

Based on risk level, Rivaro can automatically escalate:

Risk LevelTriggerAction
MINIMALLow risk score, high trustNormal operation
ELEVATEDModerate violations, trust decliningWARN — violation logged with elevated visibility
HIGHSignificant violations, low trustRATE_LIMIT — actor throttled to 10–20 req/min
CRITICALSevere violations or very low trustQUARANTINE — all requests blocked until admin review
CRITICAL + repeatCritical risk with violations above termination thresholdTERMINATE — actor permanently blocked

Quarantine

When an actor is quarantined:

  • All proxy requests from that actor are immediately blocked (403)
  • The actor appears in the dashboard's quarantine queue
  • An administrator must review and either release or terminate the actor
  • On release, violation counts reset and the trust score begins recovering

Termination

When an actor is terminated:

  • All proxy requests are permanently blocked
  • The actor cannot be reactivated through normal flows
  • This is reserved for severe, repeated policy violations

Admin controls

Automatic quarantine and termination can be enabled or disabled per organization. When disabled, Rivaro still calculates risk levels and trust scores but only warns — it doesn't take automatic action.

Next steps