clusterrace

Blog · 2026-05-08

Why we measure platforms by enablement, not uptime

Most platform teams measure themselves on uptime. It's the wrong number.

Uptime tells you the platform didn't break. It doesn't tell you whether anyone got faster because of it. A platform with five-nines availability that ships one new service per quarter is failing — quietly, expensively, in a way that doesn't page anyone.

What to measure instead

Two numbers, neither of them about the platform itself:

  1. Time-to-first-deploy for a new service. From git init to production traffic. If it's longer than half a day, your golden path is decoration.
  2. Time-to-fix for a platform-induced incident. When the platform causes pain, how long does it take to make that pain go away? Not for the user — for the engineer who hit it.

These numbers are unflattering. That's the point.

The trap

Teams measure uptime because uptime is easy to graph. Enablement is messier — it requires talking to engineers, watching them work, accepting that "the platform is fine" sometimes means "the platform is in the way."

The platforms we've shipped that worked weren't the most reliable ones. They were the ones that made the next service trivially easy to spin up, instrument, and forget.

That's the bar.