Skip to main content

Discovery & Shadow AI

Automatically map your entire AI attack surface — cloud services, running agents, MCP servers, source code, and direct browser-based AI usage — before you can govern it.

Overview

Discovery runs continuously across your infrastructure, finding AI assets you may not know exist. Every discovered asset enters an approval workflow before it can be used by governed agents. Shadow AI detection catches direct AI usage (ChatGPT, Claude, etc.) happening outside your proxy.

Discovery works through channels — configured integrations with your infrastructure. Each channel type uses a different collection method and targets a different part of your environment.

Discovery Channels

Rivaro supports 10 discovery channel types. Configure them in Settings > Discovery Sources.

Channel TypeModeWhat it finds
CLOUD_AI_SERVICESScheduledAWS SageMaker/Bedrock, GCP Vertex AI, Azure ML — endpoints, models, and AI-specific security risks
SOURCE_CODEScheduledGitHub, GitLab, Bitbucket — AI libraries, hardcoded API keys, risky code patterns
CONTAINER_REGISTRYScheduledDocker Hub, ECR, GCR, ACR — AI and MCP containers, vulnerability findings
API_GATEWAYScheduledAWS API Gateway, Kong, Nginx — AI and MCP API endpoints
COLLABORATION_PLATFORMScheduledSlack, Teams, Google Workspace — unauthorized AI bots, plugins, integrations
IDENTITY_ACCESSScheduledOkta, Azure AD, AWS IAM — AI access patterns, service accounts with AI permissions
LOG_ANALYSISHybridCloudWatch, Stackdriver, Splunk — shadow AI usage patterns from logs (API query or log forwarding agent)
NETWORK_ENDPOINTAgent callbackRunning MCP servers, shadow AI agents, exposed endpoints — requires a deployed network scanner agent
AGENT_DATAAgent callbackPre-collected data from client-side agents
MANUAL_ENTRYManualAdmin-created assets — auto-approved on creation

Channel configuration

Each channel has these common fields:

FieldDescription
nameDisplay name for this channel
channelTypeOne of the types above
activeWhether the channel runs on its schedule
pollingIntervalSecondsHow often to run (scheduled channels)
configurationChannel-specific settings (non-sensitive)
lastRunAtTimestamp of most recent scan
lastRunStatusSUCCESS, FAILED, or RUNNING
lastRunAssetCountAssets found in last run
lastRunRiskCountRisk findings in last run
note

Sensitive credentials (API keys, tokens, secrets) are stored separately in an encrypted credential store — never in the channel configuration JSON.

Network scanner agent

For NETWORK_ENDPOINT channels, Rivaro generates a downloadable Python agent. The agent embeds your detection key, scans your internal network, and POSTs results back to /api/admin/discovery/agent/results. Deploy it anywhere with network access to your internal AI infrastructure.

What Gets Discovered

Each discovered asset is classified by type and category:

CategoryExamples
AI_SERVICEOpenAI, Anthropic, Vertex AI endpoints in use
AI_MODELDeployed models, fine-tuned versions, model registries
DATA_STORAGEVector databases, embedding stores, training data repositories
ML_PIPELINETraining pipelines, fine-tuning jobs, MLflow experiments
SOURCE_CODERepositories with AI dependencies or hardcoded keys
CONTAINERDocker images with AI/MCP packages
IDENTITY_ACCESSService accounts, roles with AI service permissions
USAGE_PATTERNPatterns of AI API calls detected in logs

Asset risk findings

Each discovered asset can have associated findings — specific security or compliance issues detected during scanning:

Finding fieldDescription
detectionTypee.g. CREDENTIAL_EXPOSURE, MISCONFIGURATION, INFRASTRUCTURE_MCP_PUBLIC_ENDPOINT
severityCRITICAL, HIGH, MEDIUM, LOW
statusACTIVE, RESOLVED, IGNORED
descriptionHuman-readable description of the finding
detectedContentWhat was found (masked in UI)
remediationStatusNONE → PLAN_AVAILABLE → EXECUTION_IN_PROGRESS → SUCCESS

Asset Approval Workflow

Rivaro defaults to zero-trust / default-deny: every new asset starts as PENDING_APPROVAL. No agent can use an unapproved asset.

Approval lifecycle

StatusMeaning
PENDING_APPROVALDiscovered, awaiting security team review
APPROVEDReviewed and explicitly approved for use
BLOCKEDReviewed and denied — agents cannot access
ACTIVEApproved and currently in use by governed agents
PROMOTEDGraduated to a governed entity (agent, data source, model)
REMOVEDAsset no longer detected in environment
ARCHIVEDDeprecated, kept for audit history

The approval request includes a riskScore (0–100) calculated from the asset's findings. Reviewers can add notes before approving or denying.

Promoting an asset

Approved assets can be promoted — graduated into a fully governed entity with an AppContext, detection key, and full enforcement. This is how shadow infrastructure becomes official, monitored infrastructure.

Promoted entity typeWhat it becomes
AGENTA registered agent identity with trust score tracking
DATA_SOURCEA governed data source with access controls
MODELAn approved model with allowed-model list enforcement
INTEGRATIONA governed integration with policy enforcement
SERVICEAn approved AI service endpoint

Multi-Source Correlation

The same asset may be discovered by multiple channels. Rivaro deduplicates using an externalId fingerprint — the same fingerprint from two channels links to one asset, with confidence increasing with each additional source.

Observation typeConfidenceHow it's detected
DISCOVEREDSUSPECTED → INFERREDFound by a scanner/channel scan
RUNTIME_USAGECONFIRMEDSeen in live agent traffic through the proxy
CODE_REFERENCEINFERREDFound in source code as an import or API call
IAM_POLICYINFERREDService account has permission to access it

Shadow AI Detection

Shadow AI is direct use of AI services (ChatGPT, Claude, Perplexity, etc.) that bypasses your proxy — typically via a browser. The Rivaro Shadow AI browser extension monitors this activity and applies your policies in real time.

How it works

  1. Install the Chrome extension and configure it with your organization's detection key
  2. The extension monitors supported AI domains: chatgpt.com, claude.ai, bard.google.com, bing.com/chat, poe.com, perplexity.ai, and more
  3. When a user types a prompt and submits it, the extension captures the content and sends it to Rivaro's detection engine (/v1/shadow)
  4. Rivaro runs the same detection pipeline as the proxy — PII, PHI, credentials, prompt injection, etc.
  5. The response action is applied directly in the browser

Shadow AI policy actions

ActionWhat the user sees
BLOCKModal appears, submission is prevented
REDACTModal shows sanitized version; user can copy and resubmit
LOGSubmission proceeds, violation is logged in the dashboard
ALLOWNo action, submission proceeds normally

Shadow AI analytics

The Shadow AI dashboard tracks:

  • Session trends — daily session counts and week-over-week change
  • Violations by severity — CRITICAL / HIGH / MEDIUM / LOW breakdown
  • Compliance rate — percentage of sessions with no violations
  • Risk users — top users by risk score and violation count
  • Cost exposure — estimated API cost of shadow usage, productivity hours
  • Compliance by framework — HIPAA, GDPR, and other framework-level metrics

Zero Trust inventory

Shadow AI detection surfaces an unverified asset inventory including:

  • Agent runtimes — LangChain, AutoGen, CrewAI instances running without governance
  • MCP servers — unauthenticated or public MCP endpoints
  • AI bots — Slack/Teams bots with excessive AI access
  • Public endpoints — ML infrastructure exposed to the internet

Dashboard

The Discovery dashboard (Discover in the nav) has seven tabs:

TabWhat's there
ExploreAI attack surface visualization graph
FindingsGrouped risk findings across all assets, filterable by severity and source
Recommended ActionsAI-generated remediation strategies prioritized by risk
AssetsFull asset inventory, filterable by category, status, and risk level
SourcesDiscovery channels with status, last run, and asset counts
Scan HistoryPast discovery runs with results
Asset Risk PostureCharts and analytics across the asset inventory

Next steps