Aug 1, 2024

AI Guardrails: Best Practices

Essential best practices for implementing AI guardrails to enhance security and performance in your AI applications

To ensure the safety and integrity of AI interactions, implementing guardrails is crucial. These mechanisms protect both users and developers from potential risks, particularly when dealing with sensitive information and external systems. This discussion focuses on two main types: input guardrails and output guardrails.

Input Guardrails

Input guardrails are essential for preventing the exposure of private data and guarding against harmful prompts that could compromise system integrity.

When using external model APIs, there’s always a risk of inadvertently sharing sensitive information. For instance, an employee might unknowingly input confidential company data into an AI prompt, risking exposure. A notable example involved Samsung employees unintentionally leaking proprietary information via ChatGPT, leading to a significant policy change in 2023.

Although it’s challenging to completely prevent such leaks when using third-party services, deploying guardrails can significantly reduce the risk. Tools that detect sensitive data—such as personal identifiers, images, or proprietary terms—are highly effective. These tools, often powered by proprietary ML models, can identify and flag potential risks, allowing developers to either block the query or redact sensitive elements before processing.

The challenge of jailbreaking AI models has become a popular online activity, where users try to manipulate AI into making inappropriate or harmful statements. While it might be entertaining for some to provoke AI into controversy, it poses significant risks for customer-facing applications. A compromised support chatbot, for instance, can damage your brand’s reputation.

To mitigate these risks, it’s essential to implement robust guardrails that prevent the execution of harmful actions. For instance, no SQL commands that modify data should be processed without human oversight. Though this added layer of security might slow down your system, it is vital for safeguarding your data. We’ve seen AI System Designs that run input guardrails in parallel to the main query and wait until the implemented guardrail give back “a green light” before they return the LLM response back to the user.

Additionally, to ensure your AI doesn’t engage in inappropriate discourse, restrict it from addressing certain topics, such as politics. Implement filters to exclude inputs containing specific sensitive phrases or employ advanced AI algorithms to classify and block out-of-scope queries.

Output Guardrails

Given the probabilistic nature of AI models, their outputs can sometimes be unreliable. Implementing output guardrails is essential to enhance your application’s reliability. These guardrails serve two primary purposes:

  1. Assess the quality of each generated output.

  2. Define policies to handle various failure modes.

Output Quality Measurement

To ensure outputs meet your standards, it is crucial to recognize potential failures. By identifying common failure modes and implementing strategies to detect them, you can maintain high-quality and reliable AI-generated outputs. Common failure modes include malformatted responses and toxic outputs. For instance, a response expected in JSON format might be missing a closing bracket. Validators like regex and JSON validators can detect such issues. Toxic responses, such as racist or sexist remarks, can be identified using various toxicity detection tools. Basic retry logic can mitigate many failures; for example, retrying a query multiple times until a valid response is received. However, this approach can increase latency and costs, as each retry doubles the number of API calls.

To reduce latency, parallel processing can be employed. By sending the same query simultaneously and selecting the best response, redundancy is managed without significant delays. Additionally, for complex queries, fallback mechanisms such as transferring to human operators based on key phrases or sentiment analysis can ensure seamless user experience.

Guardrail Tradeoffs

Implementing guardrails involves balancing reliability and latency. While some teams prioritize speed and opt out of guardrails to avoid latency increases, most recognize that the potential risks outweigh the added latency costs. Output guardrails pose challenges in stream completion mode, where responses are shown token by token to reduce wait times. This method can bypass guardrails, allowing unsafe content to reach users before being flagged. Choosing between self-hosting models and using third-party APIs also involves tradeoffs. Self-hosting minimizes data exposure but requires comprehensive guardrail implementation. Conversely, third-party APIs offer built-in guardrails, simplifying the process but necessitating careful data handling. You might want to speak to your compliance team here :)

Get our monthly newsletter filled with strategies to enhance your Online Marketing using Artificial Intelligence.

Unsubscribe at any time.