Enforcement & Policies
Configure what happens when Rivaro detects a violation — from observation mode (detect and log) to full enforcement (block, redact, quarantine).
Observation Mode vs Enforcement Mode
By default, with no policies configured, Rivaro runs in observation mode:
- All traffic passes through to the AI provider and back
- Detections are logged (PII found, prompt injection detected, etc.)
- Nothing is blocked or modified
- Results appear in the dashboard
This is useful for understanding what your AI traffic looks like before deciding what to enforce.
Enforcement mode activates when you configure policy rules. Rules map detections to actions: "when you find PII_SSN in egress traffic, block it."
Policy Actions
When a detection matches a policy rule, Rivaro applies one of these actions:
| Action | What happens | Developer sees |
|---|---|---|
| ALLOW | Traffic passes through unchanged | Normal response |
| LOG | Traffic passes through, violation is recorded | Normal response (violation visible in dashboard) |
| REDACT | Sensitive content is masked before forwarding | Response with [REDACTED] replacing sensitive text |
| BLOCK | Request is rejected, AI provider is never called | Error response with finish_reason: "content_filter" |
| REDACT_AND_ALERT | Content redacted and notification sent | Redacted response + admin notification |
| QUARANTINE | Actor is quarantined, all subsequent requests blocked | 403 on this and future requests until admin review |
| STEP_UP | Request held pending human approval | Request paused until approved |
| REMEDIATE | Automated remediation applied | Varies by remediation action |
| AUTO_REMEDIATE | System automatically applies fix | Varies by remediation action |
What blocking looks like to developers
When a request is blocked, the response format matches the AI provider's format so SDKs handle it gracefully:
OpenAI / Azure:
{
"choices": [{
"message": {"role": "assistant", "content": "Content blocked due to policy violations"},
"finish_reason": "content_filter"
}]
}
Anthropic / Bedrock (Claude):
{
"content": [{"type": "text", "text": "Content blocked due to policy violations"}],
"stop_reason": "content_filtered"
}
Streaming:
data: {"blocked":true,"message":"Content blocked due to policy violations"}
Developers can check for finish_reason: "content_filter" (OpenAI) or stop_reason: "content_filtered" (Anthropic) to detect enforcement blocks programmatically.
What redaction looks like
When content is redacted, the sensitive text is replaced with a mask before the request is forwarded to the AI provider (ingress) or before the response is returned to the developer (egress). The original content is preserved in the detection record for audit.
Policy Rules
A policy rule maps a detection condition to an enforcement action.
Rule structure
| Field | Description |
|---|---|
| Detection type | Specific detection to match (e.g. PII_SSN, SECURITY_PROMPT_INJECTION) |
| Risk category | Broader match — applies to ALL detection types in the category |
| Action | What to do when matched (BLOCK, REDACT, LOG, etc.) |
| Lifecycle | When to apply: INGRESS, EGRESS, DEPLOYMENT, TRAINING |
| Enabled | Toggle rule on/off without deleting |
Rule matching hierarchy
When a detection occurs, Rivaro resolves which policy rule applies using this priority order (most specific wins):
- Detection-type-level custom rule — A rule targeting a specific detection type (e.g. "BLOCK PII_SSN"). If one exists for this detection type, it wins.
- Risk-category-level custom rule — A rule targeting the detection's risk category (e.g. "REDACT all EXTERNAL_DATA_EXFILTRATION"). Applied when no detection-type-level rule exists.
- Template default — If the AppContext uses a policy template (e.g. healthcare, financial services), the template's default action for this risk category applies.
- Fallback — If nothing else matches, the action is LOG (observe and record, don't enforce).
Rule scoping
Rules can be scoped to different levels:
| Scope | Description |
|---|---|
| AppContext-specific | Applies only to traffic through one AppContext |
| Organization-wide | Applies to all traffic across the organization |
AppContext-specific rules take priority over organization-wide rules.
Enforcement Pipeline
When a request flows through the proxy, enforcement happens in phases:
Ingress (before calling the AI provider)
- Anomaly detection — rate limits, actor status checks
- Content analysis — all enabled detectors scan the input
- Policy evaluation — each detection is matched against policy rules
- Decision — ALLOW, LOG, REDACT, or BLOCK
If the decision is BLOCK, the AI provider is never called. The developer gets a block response immediately.
If the decision is REDACT, sensitive content is masked in the request before it's forwarded to the AI provider.
Egress (after the AI provider responds)
- Content analysis — detectors scan the response
- Policy evaluation — detections matched against rules
- Decision — LOG, REDACT, or flag for governance action
For streaming responses, egress detection runs on the accumulated full response after the stream completes.
Agent Governance
Beyond per-request policy enforcement, Rivaro tracks actor behavior over time and can automatically escalate responses for repeat offenders.
Trust scores
Every actor (agent, user, API key) has a trust score (0–100). The score decreases as violations accumulate and recovers over time.
| Factor | Impact |
|---|---|
| Detection severity | LOW: 10, MEDIUM: 30, HIGH: 60, CRITICAL: 100 |
| Violation count | More violations = higher risk (capped) |
| Recency | Recent violations weighted more heavily |
| Session context | Accessing credentials or sensitive data increases risk |
Automatic escalation
Based on risk level, Rivaro can automatically escalate:
| Risk Level | Trigger | Action |
|---|---|---|
| MINIMAL | Low risk score, high trust | Normal operation |
| ELEVATED | Moderate violations, trust declining | WARN — violation logged with elevated visibility |
| HIGH | Significant violations, low trust | RATE_LIMIT — actor throttled to 10–20 req/min |
| CRITICAL | Severe violations or very low trust | QUARANTINE — all requests blocked until admin review |
| CRITICAL + repeat | Critical risk with violations above termination threshold | TERMINATE — actor permanently blocked |
Quarantine
When an actor is quarantined:
- All proxy requests from that actor are immediately blocked (403)
- The actor appears in the dashboard's quarantine queue
- An administrator must review and either release or terminate the actor
- On release, violation counts reset and the trust score begins recovering
Termination
When an actor is terminated:
- All proxy requests are permanently blocked
- The actor cannot be reactivated through normal flows
- This is reserved for severe, repeated policy violations
Admin controls
Automatic quarantine and termination can be enabled or disabled per organization. When disabled, Rivaro still calculates risk levels and trust scores but only warns — it doesn't take automatic action.
Next steps
- Understanding Detections — What Rivaro detects and how it's classified
- Configuration Guide — Set up AppContexts and detection keys
- Error Handling — How enforcement appears to developers