Large Language Models: Site Unreliability

One of the key lessons I learned as a junior engineer was to always ask, “What’s the SLA?” during design reviews. This question might seem basic, but it’s fundamental to understanding the requirements of any service. If you don’t know the performance and reliability standards you need to meet, how can you effectively build or measure your system?

SLI, SLO, and SLA: Clearing Up the Confusion 

I often see these terms used interchangeably, and I like to borrow definitions from the Google SRE Book

  • Service Level Indicators (SLI) - Metrics used to measure some aspect of your service. A good example here is latency (ms).
  • Service Level Objectives (SLO) - Boundaries on a given SLI, e.g. I would like my service to have 100 ms (or less) latency 95% of the time. These are often just performance goals.
  • Service Level Agreements (SLA) - This is perhaps the most misunderstood term, but SLAs boundaries on your indicators that come with penalties from prior agreements or contracts. Now this sounds similar to an SLO, but the Google SRE guide summarizes it well: "What happens if the SLOs aren't met? If there is no explicit consequence, then you are almost certainly looking at an SLO."

In general people often think of SLOs as SLAs, but in reality SLAs carry a consequence with them. That’s the key difference—SLAs are about putting money (or something else of value) on the line.

LLM applications

Today, every company seems to be including some type of LLM or generative AI experience into their products. This can take infinite forms: 

  • FAQ assistants - helping people on landing pages find the most relevant information
  • Recommendations - providing potential actions customers might take on their data
  • Endless other generative AI ideas – From content generation to conversational interfaces.

As we continue to see rapid advancements in AI (A16Z even listed a few among their “Big Ideas of 2025”), reliability becomes critical. Why? Because users—and the businesses serving them—expect consistent performance.

“So, What’s the SLA on my LLM?”

If you plan to have an LLM on your product’s main workflow, you need to understand what happens when that LLM is down or underperforming. First-party LLM providers aren’t necessarily meeting traditional cloud provider SLAs. For example, when I checked OpenAI’s uptime recently, it was 99.57%—this translates to over three hours of downtime per month. Three hours may not sound like much, but if you’re counting on this service to drive critical revenue, three hours of downtime is significant.

Putting Money on the Line

Many organizations want to monetize new AI products and features. Some already have established revenue streams, so they’re looking to AI to enhance productivity or user satisfaction. But before putting an AI feature on the revenue-critical path, it’s crucial to ask:

“Can we afford this service to be down for 3 hours a month?”

If not, you might need an SLA with serious guarantees, rather than just an informal SLO.

Making AI Reliable: Self-Hosting and Open-Source LLMs

One solution is turning to open-source LLMs. Thanks to models like Llama (and many others), performance can be comparable to the more well-known hosted solutions, depending on the use case. Several providers now let you host these models on your own infrastructure (on-premise or on dedicated cloud servers) with a contractual SLA—meaning you’ll receive credits or a penalty payout if they fail to meet their obligations.

Self-hosting (or going with a provider that offers stringent SLAs) can be the key to delivering revenue-critical AI products. Of course, there are trade-offs:

  • Cost & Complexity – Managing your own infrastructure or paying for dedicated hosting can be more expensive or operationally complex.
  • Expertise – Tuning and scaling open-source LLMs require in-house ML skill sets that not every team has.

Still, if reliability is non-negotiable for your use case, these investments can be worthwhile.

Conclusion

Whether you’re deploying a simple FAQ bot or an intricate personalization system, think of SLAs, SLOs, and SLIs as the foundation of service reliability. As AI continues to evolve, the expectations for uptime and performance will only rise. By asking the right questions early—“What’s our SLA?”—and choosing the infrastructure (and provider) that can support those goals, you set your product up for success in the rapidly advancing world of AI.