Books and Long-Form Writing

The Original

“Google’s SREs have done our industry an enormous service by writing up the principles, practices and patterns—architectural and cultural—that enable their teams to combine continuous delivery with world-class reliability at ludicrous scale. You owe it to yourself and your organization to read this book and try out these ideas for yourself.”

—Jez Humble

“I like this book a lot. If you care about building reliable systems, reading through this book and seeing what the teams around you don’t do seems like a good exercise. […] Even including the downsides, I'd say that this is the most valuable technical book I've read in the past year.”

—Dan Luu

The Next Generation

“In 2016, Google dropped Site Reliability Engineering on the operations world, and that world was never the same. For the first time people had access to over 500 pages of distilled information on what Google does to run its planet-wide infrastructure. Most people liked the book, a handful didn’t, but nobody ignored it. It became a seminal work and an important touchstone for how people thought about SRE (especially the Google implementation of it) from that point on. […] Now in 2018, Google returns to fill in a crucial piece of the puzzle: in their first volume they described what they do, but that didn’t help those who couldn’t see themselves in Google’s story. This book aims to demonstrate how Google does SRE— and how you can do it, too.”

—David N. Blank-Edelman

 

Authored chapter on SLO Monitoring and Alerting; general editorial support.

Authored chapter (a polemic, really) about on-call and what can be done about it.