01
Input layer
Filters and classifiers on incoming prompts. Detect prompt injection, off-topic requests, and abusive content before they reach the model.
Framework 02
Specific, named mechanisms RSUA uses to make AI systems safe to deploy and easy to defend.

A production AI system has guardrails at five layers: input, retrieval, generation, action, and audit. Each layer catches different failure modes. Skip any layer and the others can be defeated.
01
Filters and classifiers on incoming prompts. Detect prompt injection, off-topic requests, and abusive content before they reach the model.
02
Access controls on what data the AI can see. Row-level permissions, document-level redaction, and tenant isolation enforced at retrieval.
03
Constraints on what the model produces. Output schemas, citation requirements, and refusal patterns for out-of-scope queries.
04
Gates on what the AI can actually do. Confidence thresholds, human approval for high-stakes actions, and hard limits on irreversible operations.
05
A complete record of every meaningful decision: input, retrieval, model used, output, action taken, and reviewer.
Red teaming is not a launch milestone. It is a recurring practice. Every meaningful change to prompts, tools, or data surfaces deserves adversarial review.

Continue Reading
Framework 03