Rule Tester

Try a rule against sample input before turning it on for real traffic.

Rule Tester

Try a rule against sample input before turning it on for real traffic.

The Rule Tester lets you paste a sample input and see exactly which rules match, which entities are detected, and what each match’s confidence score is. Use it before saving any new rule and any time you change a threshold or context keyword on an existing one. Most rule mistakes look correct in the editor and only show up once real traffic hits.

Open Security → Rule Tester.

How to use it

Paste a sample of the data you want to test against. A real tool argument or tool result, copied from the Tool Call Logs, is the best source.
Pick which rules to test against - every active rule by default, or a subset you’re tuning.
Click Run.

The tester returns:

A list of every match, with the entity type, the matched substring, the confidence score, and the rule that produced it.
Whether each match would clear the threshold (and therefore fire its action) or be suppressed.
For matches near context keywords, which keywords contributed to the score.

What to test

Run every new rule through three cases.

Positive cases

The data the rule is supposed to catch. Confirm the score lands above your threshold.

Sample:  Customer CUST-00547 is requesting a refund.
Expect:  CUSTOMER_ID matched on "CUST-00547", confidence 0.85, fires.

Negative cases

Data that looks similar but shouldn’t match. Confirm the score lands below your threshold (or doesn’t match at all).

Sample:  The CUSTODIAL bank holds the assets.
Expect:  No match (CUST- pattern requires dash and digits).

Edge cases

The fuzzy boundaries. Words near context keywords. Different casings. Whitespace variants.

Sample:  The customer ID is 12345.
Expect:  Depends on rule - if you intended this to match, you need a more flexible pattern.

Tuning thresholds with the tester

The score the tester returns is the score the rule will see in production. To dial in the threshold:

Run a representative sample of legitimate (false-positive) matches. Note the highest score they produce.
Run a representative sample of real (true-positive) matches. Note the lowest score they produce.
Set the threshold above the false-positive max and below the true-positive min.

If those two numbers cross - false positives score higher than true positives - your rule needs better disambiguation: more specific pattern, more context keywords, or a different approach entirely.

Testing context keywords

The tester shows which context keywords fired on each match. Useful when you can’t tell why a rule is or isn’t matching:

Score lower than expected? Your context keywords aren’t appearing in the test sample.
Score higher than expected? Common words near matches are inflating scores. Tighten the keyword list.

After deploying

The tester is for pre-flight validation. The Alerts dashboard is for post-flight observation. After you push a rule live, watch the breakdown for a few hours - if the volume looks wrong, come back to the tester with real samples from the violation log.

Watch what your rules catch in production via Violations and alerts.

Open Security → Rule Tester.

How to use it

Paste a sample of the data you want to test against. A real tool argument or tool result, copied from the Tool Call Logs, is the best source.
Pick which rules to test against - every active rule by default, or a subset you’re tuning.
Click Run.

The tester returns:

A list of every match, with the entity type, the matched substring, the confidence score, and the rule that produced it.
Whether each match would clear the threshold (and therefore fire its action) or be suppressed.
For matches near context keywords, which keywords contributed to the score.

What to test

Run every new rule through three cases.

Positive cases

The data the rule is supposed to catch. Confirm the score lands above your threshold.

Sample:  Customer CUST-00547 is requesting a refund.
Expect:  CUSTOMER_ID matched on "CUST-00547", confidence 0.85, fires.

Negative cases

Data that looks similar but shouldn’t match. Confirm the score lands below your threshold (or doesn’t match at all).

Sample:  The CUSTODIAL bank holds the assets.
Expect:  No match (CUST- pattern requires dash and digits).

Edge cases

The fuzzy boundaries. Words near context keywords. Different casings. Whitespace variants.

Sample:  The customer ID is 12345.
Expect:  Depends on rule - if you intended this to match, you need a more flexible pattern.

Tuning thresholds with the tester

The score the tester returns is the score the rule will see in production. To dial in the threshold:

Run a representative sample of legitimate (false-positive) matches. Note the highest score they produce.
Run a representative sample of real (true-positive) matches. Note the lowest score they produce.
Set the threshold above the false-positive max and below the true-positive min.

If those two numbers cross - false positives score higher than true positives - your rule needs better disambiguation: more specific pattern, more context keywords, or a different approach entirely.

Testing context keywords

The tester shows which context keywords fired on each match. Useful when you can’t tell why a rule is or isn’t matching:

Score lower than expected? Your context keywords aren’t appearing in the test sample.
Score higher than expected? Common words near matches are inflating scores. Tighten the keyword list.

After deploying

Watch what your rules catch in production via Violations and alerts.

Rule Tester

Rule Tester

How to use it

What to test

Positive cases

Negative cases

Edge cases

Tuning thresholds with the tester

Testing context keywords

After deploying

Next

How to use it

What to test

Positive cases

Negative cases

Edge cases

Tuning thresholds with the tester

Testing context keywords

After deploying

Next