“Release It!”

A while ago I wrote a post on Learning to Fail inspired largely by Michael T. Nygard’s book titled “Release It”. Now it’s time to review the book itself.

As the sub-title says, the book is all about designing and deploying production-ready software. It opens with a great introduction on why it really matters: Because software often is critical to business. Because its reliability and performance is really our job and matter of professionalism. Finally (if that’s not enough), because its behavior in production will have huge impact on our quality of life as well – matter of choosing between panic attacks and phone ringing at 4 AM, or software Just Working by itself, letting you enjoy healthy life and doing more fun stuff at work. That’s the center of mass here, by the way: More on development and operations, less on management and business.

The book is divided into four main areas. Each starts with a bit of theoretical introduction and/or an anecdote, followed by discussion of concrete phenomena, problems and solutions. Even though it might appear as a collection of patterns and antipatterns, it’s much more than that. Patterns and antipatterns are just a form, but it’s really about setting the focus for a few pages and naming the problem. Anyway, the “pattern” and “antipattern” concept is gone by the middle of the book.

The first part talks about stability, and how it’s impacted by error propagation, lack of timeouts, all kinds of poor error handling, weaker links etc. Then it shows solutions: How to stop errors from propagating. How to be paranoid, expect failure in each integration point (with 3rd party and not), and deal with them. How to fail fast. And so on.

The second part talks about capacity: Dealing with load, understanding constraints and making predictions. Impact from seasonal phenomena or ad campains. Strange and not obvious usage patterns – hitting “refresh” button, web scrapers etc. Finally, dealing with those issues with proper use of caching, pooling, precomputing content and tuning garbage collection.

The third part is a bag with all kinds of design issues: networking, security, availability (understanding and defining requirements, followed by load balancing and clustering), testing and administration.

The last part is all about operations: logging, monitoring, transparency, releasing, that kind of stuff. How to organize it so that routing maintenance will be less pain, monitor will let us detect issues early, and finally after or during an issue we will have enough information to diagnose it.

Some problems are discussed from bird’s eye view. Most problems are more down-to-earth, providing detailed discussion of an issue with a sketch for solution with its weak and strong points. Finally, when applicable, author rolls up his sleeves and is ready to talk about concrete code, SQL, heapdumps, scripting etc.

The book is actually full of real war stories, anecdotes, code samples, tool descriptions, case studies, and all kinds of concrete content. There are a few larger stories that go on like this: On this project the team did this, this and that in order to migate such and such risks. When marketing sent an advert, or when the system was launched, or during routine maintenance, this and this broke and started causing problems. We did heapdumps, monitored traffic and contents, read or decompiled the code etc. and discovered problems there and there. Finally, we solved them with this and that. And here comes the detailed list of trouble spots and ways to mitigate them. It’s really a complete view – from business perspective and needs, down to nitpicking about particular piece of code or discussing popular tools.

Apart from being a great collection of real problems and tricks, there is one longer lasting, recurring aspect that may be the most valuable lesson here. Michael T. Nygard regularly shows (and makes you feel it deep in your guts, especially if you did some maintenance in production) that you really should be expecting failure everywhere and every time. You should try and predict, and mitigate, as many issues as possible, as early as possible. You should be paranoid. More than that, embrace the fact that you will fail to predict everything, and design so that even random unpredictable failures won’t take you down and may be easier to solve.

All the time it’s very concrete and complete. It also feels very professional, genuine and even inspiring.

Highly recommended.

“Release It!”

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112