Back at university, when I dealt with much low-level problem solving and very basic libararies and constructs, I learned to pay attention to what can possibly go wrong. A lot. Implementing reliable, hang-proof communication over plain sockets? I remember it today, a trivial loop of “core logic” and a ton of guards around it.

Now I suspect I am not the only person who got used to all the convenient higher-level abstractions so much that he began to forget this approach. Thing is, real software is a little bit more complex, and the fact that our libraries kind of deal with most low-level problems for us doesn’t mean there are no reasons to fail.

Software

As I’m reading “Release It!” by Michael T. Nygard, I keep nodding in agreement: Been there, done this, suffered that. I’ve just started, but it’s already shown quite a few interesting examples of failure and error handling.

Michael describes a spectacular outage of an airline software. Its experienced designers expected many kinds of failures and avoided many obvious issues. There was a nice layered architecture, with proper redundancy on every level from clients and terminals, through servers, through database. All was well, yet on a routine maintenance in database the entire system just hung. It did not kill anyone, but delayed flights and serious financial losses have an impact too.

The root cause turned out to be one swallowed exception on servers talking to the database, thrown by JDBC driver when the virtual IP of the database server was remapped. If you don’t have proper handling for such situations, one such leakage can lock the entire server as all of its threads wait for the connection or for each other. Since there were no proper timeouts anywhere in the server or above, eventually everything hung.

Now it’s easy to say: It’s obvious, thou shalt not swallow exceptions, you moron, and walk on. Or is it?

The thing is, an unexpected or improperly handled error can always happen. In hardware. Or a third party component. Or core library of your programming language. Or even you or your colleague can screw up and fail to predict something. It. Just. Happens.

Real Life

Let’s take a look at two examples from real life.

Everyone gets in the car thinking: I’m an awesome driver, accidents happen but not to me. Yet somehow we are grateful for having airbags, carefully designed crumple zones, and all kinds of automatic systems that prevent or mitigate effects of accidents.

If you were offered two cars at the same cost, which would you choose? One is in pimp-my-ride style with extremely comfortable seats, sat TV, bright pink wheels and whatever unessential features. But it breaks down every so often based on its mood or the moon cycle, and would certainly kill you if you hit a hedgehog. The other is just comfortable enough, completely boring, no cool features to show off at all. But it will serve you 500,000 kilometers without a single breakdown and save your life when you hit a tree. Obvious, right?

Another example. My brother-in-law happens to be a construction manager at a pretty big power plant. He recently took me on a trip and explained some basics on how it works, and one thing really struck me.

The power station consists of a dozen separate generation units and is designed to survive all kinds of failures. I was impressed, and still am, that in power plant business it’s normal to say stuff like: If this block goes dark, this and this happens, that one takes over, whatever. No big deal. Let’s put it in a perspective. A damn complicated piece of engineering that can detect any potentially dangerous conditions, alarm, shut down and fail over just like that. From small and trivial things like variations in pressure or temperature, through conditions that could blow the whole thing up. And it is so reliable that when people talk about such conditions, rare and severe as they are, they say it in the same tone as “in case of rain the picnic will be held at Ms. Johnson’s”.

Software Again

In his “After the Disaster” post, Uncle Bob asked: “How many times per day do you put your life in the hands of an ‘if’ statement written by some twenty-two year old at three in the morning, while strung out on vodka and redbull?”

I wish it was a rhetorical question.

We are pressed hard to focus on adding shiny new features, as fast as possible. That’s what makes our bosses and their bosses shine and what brings money to the table. But not only them, even we (the developers) naturally take most pride in all those features and find them the most exciting part of our work.

Remember that we’re here to serve. While pumping out features is fun, remember that those people simply rely on you. Even if you don’t directly cause death or injury, your outages can still affects lives. Think more like a car or power station designer, your position is really closer to theirs than to a lone hippie who’s building a little wobbly shack for himself.

When an outage happens and also causes financial loss, you will be to blame. If that reasoning does not work, do it for yourself – pay attention now to avoid pain in future, be it regular panic calls at 3 AM or your boss yelling at you.

More Stuff

Michael T. Nygard ends that airline example with a very valuable advice. Obvious as it may seem, it feels different if you realize it and engrave it deep in your mind. Expect failure everywhere, and plan for it. Even if your tools handle some failures, they can’t do everything for you. Even if you have at least two of each thing (no single point of failure), you can still suffer from bad design. Be paranoid. Place crumple zones on every integration point with other systems, and even different components of your system, in order to prevent cracks from propagating. Optimistic monoliths fail hard.

Want something more concrete? Go read “Release It!”, it’s full of great and concrete examples. There’s a reason why it fits in a book and not in a blog post.

Learning to Fail

Software

Real Life

Software Again

More Stuff

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112