SLOs and Error Budgets in Practice: Short Windows Beat Monthly Resets
February 27, 2026
Three terms get used interchangeably and they should not. An SLI is a measurement: the fraction of requests in the last five minutes that returned a 2xx in under 300ms. An SLO is a target on that measurement: 99.9 percent of requests must meet the SLI. An SLA is a contract with a customer, usually with money attached, almost always set looser than the internal SLO so you have headroom before refunds.
The error budget is the slack the SLO gives you. If your SLO is 99.9 percent over a 30 day window, you get 0.1 percent of the window as failure runway. That works out to about 43 minutes of downtime per month. Move the target to 99.99 percent and you are down to 4.3 minutes. The single nine that looks like a small tightening on a slide costs you 90 percent of the budget you had to work with.
The window choice matters more than the target. A calendar month window resets on the first, which sounds tidy but quietly creates a perverse cycle. A team I worked with had a 99.9 percent SLO measured per calendar month. Every month went the same way. A bad deploy in week one would chew through the budget. Deploys froze for the rest of the month. Tech debt piled up. On the first of the next month, the team would ship the accumulated backlog in one giant push. That push had no canary headroom because nobody had been deploying. Outages clustered at the start of every month, and the SLO chart looked fine until day eight.
The fix had two parts. Switch to a rolling 30 day window so the budget does not have a step function. Add multi-window burn-rate alerts so the team gets paged before the budget is gone, not after. A standard pair is: if the service is burning 10 percent of the monthly budget in a single hour, page now; if it is burning 5 percent over six hours, ticket for the next business day. The first alert catches active outages, the second catches slow regressions that would otherwise hide in the noise.
The discipline that makes SLOs useful is treating the budget as a real currency. When it is full, you ship. When it is empty, you stop and harden. The two states should feel different to the team, and the alerting should make that obvious before the dashboards do.
An SLO without burn-rate alerts and a rolling window quietly trains teams to ship unsafe deploys after every freeze. Shorter windows and multi-burn-rate paging keep the control loop honest.
Originally posted on LinkedIn. View original.