Discovery & Shadow AI
Automatically map your entire AI attack surface — cloud services, running agents, MCP servers, source code, and direct browser-based AI usage — before you can govern it.
Overview
Discovery runs continuously across your infrastructure, finding AI assets you may not know exist. Every discovered asset enters an approval workflow before it can be used by governed agents. Shadow AI detection catches direct AI usage (ChatGPT, Claude, etc.) happening outside your proxy.
Discovery works through channels — configured integrations with your infrastructure. Each channel type uses a different collection method and targets a different part of your environment.
Discovery Channels
Rivaro supports 10 discovery channel types. Configure them in Settings > Discovery Sources.
| Channel Type | Mode | What it finds |
|---|---|---|
CLOUD_AI_SERVICES | Scheduled | AWS SageMaker/Bedrock, GCP Vertex AI, Azure ML — endpoints, models, and AI-specific security risks |
SOURCE_CODE | Scheduled | GitHub, GitLab, Bitbucket — AI libraries, hardcoded API keys, risky code patterns |
CONTAINER_REGISTRY | Scheduled | Docker Hub, ECR, GCR, ACR — AI and MCP containers, vulnerability findings |
API_GATEWAY | Scheduled | AWS API Gateway, Kong, Nginx — AI and MCP API endpoints |
COLLABORATION_PLATFORM | Scheduled | Slack, Teams, Google Workspace — unauthorized AI bots, plugins, integrations |
IDENTITY_ACCESS | Scheduled | Okta, Azure AD, AWS IAM — AI access patterns, service accounts with AI permissions |
LOG_ANALYSIS | Hybrid | CloudWatch, Stackdriver, Splunk — shadow AI usage patterns from logs (API query or log forwarding agent) |
NETWORK_ENDPOINT | Agent callback | Running MCP servers, shadow AI agents, exposed endpoints — requires a deployed network scanner agent |
AGENT_DATA | Agent callback | Pre-collected data from client-side agents |
MANUAL_ENTRY | Manual | Admin-created assets — auto-approved on creation |
Channel configuration
Each channel has these common fields:
| Field | Description |
|---|---|
name | Display name for this channel |
channelType | One of the types above |
active | Whether the channel runs on its schedule |
pollingIntervalSeconds | How often to run (scheduled channels) |
configuration | Channel-specific settings (non-sensitive) |
lastRunAt | Timestamp of most recent scan |
lastRunStatus | SUCCESS, FAILED, or RUNNING |
lastRunAssetCount | Assets found in last run |
lastRunRiskCount | Risk findings in last run |
Sensitive credentials (API keys, tokens, secrets) are stored separately in an encrypted credential store — never in the channel configuration JSON.
Network scanner agent
For NETWORK_ENDPOINT channels, Rivaro generates a downloadable Python agent. The agent embeds your detection key, scans your internal network, and POSTs results back to /api/admin/discovery/agent/results. Deploy it anywhere with network access to your internal AI infrastructure.
What Gets Discovered
Each discovered asset is classified by type and category:
| Category | Examples |
|---|---|
| AI_SERVICE | OpenAI, Anthropic, Vertex AI endpoints in use |
| AI_MODEL | Deployed models, fine-tuned versions, model registries |
| DATA_STORAGE | Vector databases, embedding stores, training data repositories |
| ML_PIPELINE | Training pipelines, fine-tuning jobs, MLflow experiments |
| SOURCE_CODE | Repositories with AI dependencies or hardcoded keys |
| CONTAINER | Docker images with AI/MCP packages |
| IDENTITY_ACCESS | Service accounts, roles with AI service permissions |
| USAGE_PATTERN | Patterns of AI API calls detected in logs |
Asset risk findings
Each discovered asset can have associated findings — specific security or compliance issues detected during scanning:
| Finding field | Description |
|---|---|
detectionType | e.g. CREDENTIAL_EXPOSURE, MISCONFIGURATION, INFRASTRUCTURE_MCP_PUBLIC_ENDPOINT |
severity | CRITICAL, HIGH, MEDIUM, LOW |
status | ACTIVE, RESOLVED, IGNORED |
description | Human-readable description of the finding |
detectedContent | What was found (masked in UI) |
remediationStatus | NONE → PLAN_AVAILABLE → EXECUTION_IN_PROGRESS → SUCCESS |
Asset Approval Workflow
Rivaro defaults to zero-trust / default-deny: every new asset starts as PENDING_APPROVAL. No agent can use an unapproved asset.
Approval lifecycle
| Status | Meaning |
|---|---|
PENDING_APPROVAL | Discovered, awaiting security team review |
APPROVED | Reviewed and explicitly approved for use |
BLOCKED | Reviewed and denied — agents cannot access |
ACTIVE | Approved and currently in use by governed agents |
PROMOTED | Graduated to a governed entity (agent, data source, model) |
REMOVED | Asset no longer detected in environment |
ARCHIVED | Deprecated, kept for audit history |
The approval request includes a riskScore (0–100) calculated from the asset's findings. Reviewers can add notes before approving or denying.
Promoting an asset
Approved assets can be promoted — graduated into a fully governed entity with an AppContext, detection key, and full enforcement. This is how shadow infrastructure becomes official, monitored infrastructure.
| Promoted entity type | What it becomes |
|---|---|
| AGENT | A registered agent identity with trust score tracking |
| DATA_SOURCE | A governed data source with access controls |
| MODEL | An approved model with allowed-model list enforcement |
| INTEGRATION | A governed integration with policy enforcement |
| SERVICE | An approved AI service endpoint |
Multi-Source Correlation
The same asset may be discovered by multiple channels. Rivaro deduplicates using an externalId fingerprint — the same fingerprint from two channels links to one asset, with confidence increasing with each additional source.
| Observation type | Confidence | How it's detected |
|---|---|---|
| DISCOVERED | SUSPECTED → INFERRED | Found by a scanner/channel scan |
| RUNTIME_USAGE | CONFIRMED | Seen in live agent traffic through the proxy |
| CODE_REFERENCE | INFERRED | Found in source code as an import or API call |
| IAM_POLICY | INFERRED | Service account has permission to access it |
Shadow AI Detection
Shadow AI is direct use of AI services (ChatGPT, Claude, Perplexity, etc.) that bypasses your proxy — typically via a browser. The Rivaro Shadow AI browser extension monitors this activity and applies your policies in real time.
How it works
- Install the Chrome extension and configure it with your organization's detection key
- The extension monitors supported AI domains:
chatgpt.com,claude.ai,bard.google.com,bing.com/chat,poe.com,perplexity.ai, and more - When a user types a prompt and submits it, the extension captures the content and sends it to Rivaro's detection engine (
/v1/shadow) - Rivaro runs the same detection pipeline as the proxy — PII, PHI, credentials, prompt injection, etc.
- The response action is applied directly in the browser
Shadow AI policy actions
| Action | What the user sees |
|---|---|
| BLOCK | Modal appears, submission is prevented |
| REDACT | Modal shows sanitized version; user can copy and resubmit |
| LOG | Submission proceeds, violation is logged in the dashboard |
| ALLOW | No action, submission proceeds normally |
Shadow AI analytics
The Shadow AI dashboard tracks:
- Session trends — daily session counts and week-over-week change
- Violations by severity — CRITICAL / HIGH / MEDIUM / LOW breakdown
- Compliance rate — percentage of sessions with no violations
- Risk users — top users by risk score and violation count
- Cost exposure — estimated API cost of shadow usage, productivity hours
- Compliance by framework — HIPAA, GDPR, and other framework-level metrics
Zero Trust inventory
Shadow AI detection surfaces an unverified asset inventory including:
- Agent runtimes — LangChain, AutoGen, CrewAI instances running without governance
- MCP servers — unauthenticated or public MCP endpoints
- AI bots — Slack/Teams bots with excessive AI access
- Public endpoints — ML infrastructure exposed to the internet
Dashboard
The Discovery dashboard (Discover in the nav) has seven tabs:
| Tab | What's there |
|---|---|
| Explore | AI attack surface visualization graph |
| Findings | Grouped risk findings across all assets, filterable by severity and source |
| Recommended Actions | AI-generated remediation strategies prioritized by risk |
| Assets | Full asset inventory, filterable by category, status, and risk level |
| Sources | Discovery channels with status, last run, and asset counts |
| Scan History | Past discovery runs with results |
| Asset Risk Posture | Charts and analytics across the asset inventory |
Next steps
- Asset Management — Manage the approved asset inventory
- Agent Management — Register and govern agents
- Remediation — Fix discovery findings with AI-powered plans
- Compliance Reporting — Use discovery data in compliance reports