Most platform teams measure themselves on uptime. It's the wrong number.
Uptime tells you the platform didn't break. It doesn't tell you whether anyone got faster because of it. A platform with five-nines availability that ships one new service per quarter is failing — quietly, expensively, in a way that doesn't page anyone.
What to measure instead
Two numbers, neither of them about the platform itself:
- Time-to-first-deploy for a new service. From
git initto production traffic. If it's longer than half a day, your golden path is decoration. - Time-to-fix for a platform-induced incident. When the platform causes pain, how long does it take to make that pain go away? Not for the user — for the engineer who hit it.
These numbers are unflattering. That's the point.
The trap
Teams measure uptime because uptime is easy to graph. Enablement is messier — it requires talking to engineers, watching them work, accepting that "the platform is fine" sometimes means "the platform is in the way."
The platforms we've shipped that worked weren't the most reliable ones. They were the ones that made the next service trivially easy to spin up, instrument, and forget.
That's the bar.