Rule Tester
The Rule Tester lets you paste a sample input and see exactly which rules match, which entities are detected, and what each match’s confidence score is. Use it before saving any new rule and any time you change a threshold or context keyword on an existing one. Most rule mistakes look correct in the editor and only show up once real traffic hits.
Open Security → Rule Tester.
How to use it
- Paste a sample of the data you want to test against. A real tool argument or tool result, copied from the Tool Call Logs, is the best source.
- Pick which rules to test against - every active rule by default, or a subset you’re tuning.
- Click Run.
The tester returns:
- A list of every match, with the entity type, the matched substring, the confidence score, and the rule that produced it.
- Whether each match would clear the threshold (and therefore fire its action) or be suppressed.
- For matches near context keywords, which keywords contributed to the score.
What to test
Run every new rule through three cases.
Positive cases
The data the rule is supposed to catch. Confirm the score lands above your threshold.
Negative cases
Data that looks similar but shouldn’t match. Confirm the score lands below your threshold (or doesn’t match at all).
Edge cases
The fuzzy boundaries. Words near context keywords. Different casings. Whitespace variants.
Tuning thresholds with the tester
The score the tester returns is the score the rule will see in production. To dial in the threshold:
- Run a representative sample of legitimate (false-positive) matches. Note the highest score they produce.
- Run a representative sample of real (true-positive) matches. Note the lowest score they produce.
- Set the threshold above the false-positive max and below the true-positive min.
If those two numbers cross - false positives score higher than true positives - your rule needs better disambiguation: more specific pattern, more context keywords, or a different approach entirely.
Testing context keywords
The tester shows which context keywords fired on each match. Useful when you can’t tell why a rule is or isn’t matching:
- Score lower than expected? Your context keywords aren’t appearing in the test sample.
- Score higher than expected? Common words near matches are inflating scores. Tighten the keyword list.
After deploying
The tester is for pre-flight validation. The Alerts dashboard is for post-flight observation. After you push a rule live, watch the breakdown for a few hours - if the volume looks wrong, come back to the tester with real samples from the violation log.
Next
Watch what your rules catch in production via Violations and alerts.