Data loss prevention

Gateway’s DLP layer inspects every inbound prompt and outbound completion against a configurable rule set, then logs, redacts, or blocks based on what it finds. Use it to keep PII out of vendor logs, to comply with a customer DPA, or to prevent secrets from leaking back to end users.

Detection layers

DLP runs three categories of rules in parallel on the same request:

Category	Source	What it catches
Global	Presidio recognizers built into Gateway	Names, emails, phone numbers, URLs, IPs, credit cards, IBAN codes, crypto addresses, dates, and locations
USA	US-specific Presidio recognizers	SSN, ITIN, passport numbers, driver’s license numbers, US bank account numbers
Custom	Per-org rules you create	Anything you can describe with regex or a keyword list, such as internal IDs, project codenames, partner names, or API key formats

Built-in entity types

The seeded rule set covers 15 entity types by default. Each one has a default action (redact, log, or block) that you can override per org.

Category	Entity	Default action
Global	`CREDIT_CARD`	`redact`
Global	`CRYPTO`	`redact`
Global	`DATE_TIME`	`log`
Global	`EMAIL_ADDRESS`	`redact`
Global	`IBAN_CODE`	`redact`
Global	`IP_ADDRESS`	`log`
Global	`LOCATION`	`log`
Global	`PERSON`	`log`
Global	`PHONE_NUMBER`	`redact`
Global	`URL`	`log`
USA	`US_BANK_NUMBER`	`redact`
USA	`US_DRIVER_LICENSE`	`redact`
USA	`US_ITIN`	`redact`
USA	`US_PASSPORT`	`redact`
USA	`US_SSN`	`redact`

Seeded rules can be disabled or have their action changed, but their detection pattern is immutable. The patterns are tied to the upstream Presidio recognizer.

Custom rules

Custom rules let you catch organization-specific patterns. Each custom rule combines two optional matchers:

Matcher	Description	Limit
`pattern`	A Python-compatible regex	500 characters max
`keywords`	A list of case-insensitive literal strings	50 keywords, 100 characters each

A custom rule must include at least one of the two. Patterns are validated with re.compile at save time, so invalid regex is rejected before reaching the data plane.

Each rule has a name (entity_type), a description, and an action. Names must be unique within your org.

Rule actions

Every rule, seeded or custom, runs with one of three actions:

Action	Effect
`log`	Record the match in the request log only. The prompt is passed through unchanged.
`redact`	Replace the matched span with a placeholder before the prompt reaches the vendor. The model sees redacted text; you can configure whether downstream callers see the original or the redacted version.
`block`	Reject the request with HTTP 400. Use sparingly, since block is the strictest action.

Two rules can match the same span. When they do, Gateway keeps the higher-priority entity (for example, CREDIT_CARD wins over DATE_TIME) so you don’t get duplicate findings on the same span.

Testing rules

The dashboard includes a built-in tester so you can validate rule behavior without sending a real request. Paste sample text and you’ll get a list of hits with their entity type, action, matched substring, and character offsets.

The tester runs a deterministic regex-based simulator. It does not call the live sidecar, which keeps the response under a millisecond and means you can iterate on rules quickly. Input is capped at 50,000 characters.

The tester is the fastest way to develop a custom rule. Iterate until the hits match what you expect, then enable the rule in your org settings.

Configuring DLP in the dashboard

Open Security → DLP in the Merge Gateway dashboard. The page lists seeded and custom rules side by side, with toggles for enabled and a dropdown for action. The same page exposes the tester so you can validate changes without leaving the screen.

Custom rule CRUD requires the Manage organization settings permission. Every change emits an audit log event.

Security alerts

Every DLP detection, whether the rule’s action is log, redact, or block, generates a security alert under Security → Alerts. Each alert includes:

Timestamp and request ID
Entity type and category
Action taken
The triggering segment (subject to your log-payloads setting)
Customer, project, and API key context

Alerts share the same store as PI-protection events, so you can filter, sort, and triage both signal types together.

Performance considerations

DLP runs synchronously on the request and response path:

Latency: the sidecar adds a few milliseconds per request. Presidio’s NER models are the heaviest component; if you don’t need PERSON / LOCATION detection, disable those rules to save the most time.
Redact vs. block: redaction adds a small constant cost (replacing spans in the payload). Blocking returns immediately, which is faster but customer-visible.
Custom rules: each regex is compiled at startup and cached. Avoid catastrophic-backtracking patterns. The tester is a good place to verify they’re efficient.

FAQ

Does DLP run on both prompts and completions?

Yes. Inbound prompts are scanned before they reach the vendor, and outbound completions are scanned before they reach your application. Both directions use the same rule set.

Can I delete seeded rules?

No. Seeded rules can be disabled or have their action changed, but they can’t be deleted. This is intentional, so we can add new recognizers in future releases without breaking existing configurations.

How is DLP different from prompt injection protection?

DLP looks for structured sensitive data (PII, secrets, customer-defined patterns). PI protection looks for adversarial intent (prompts trying to manipulate the model). They run independently on the same request and either one can block, redact, or alert.

What happens when the DLP sidecar is unreachable?

By default, DLP fails open and the request proceeds. Operations are responsible for sidecar uptime; if it goes down, you’ll see alerts about it in the platform health dashboard before any prompts go unscanned.

Are redacted prompts logged in their original or redacted form?

The redacted version is what the vendor receives and what gets stored in the request log. The original is never persisted by Gateway unless you’ve enabled payload logging on the org.

Next steps

Prompt injection protection

Detect and block prompt-injection attempts before they reach the model

Customer blocklist

Block or pin per-customer combinations of providers and models

Zero data retention

Restrict routing to vendors with zero data retention agreements