Home > Software Design > Whatever Happened to the Worst Case?

Whatever Happened to the Worst Case?

I’m dismayed about sloppy attitudes to design. I suspect they’re creeping across from consumer-focused software, where raw average speed is everything and an occasional glitch doesn’t matter. Well, I’m sorry, but as that kind of thinking creeps towards the vehicles I may travel in, or possibly the new nuclear power plant to be commissioned up the road from here, I feel the need to voice my concerns.

Yes, the examples I mention above come from a section of our industry which we can broadly classify as safety-critical and the people working there really do take such matters as robust software very seriously. But all their procedures, documents, standards, reviews and testing will be gradually undermined as their designers and programmers, who ought to consider the worst case at every stage, and are implicitly trusted to do that, increasingly fail to do it. Some engineers seem completely oblivious of the worst-case principle and some seem to have heard about it but decided to ignore it. Those in the former category need to be enlightened, while those in the latter should, to put it kindly, be re-deployed in some other capacity, where they can do no damage.

As a young electronics engineer, I learned very quickly that everything had to be designed with the worst case in mind. This was essential in order to avoid undervoltage, overcurrent, edge races, various other bad things and – ultimately – unreliable products. The calculations were tedious and sometimes it was difficult even to work out what the worst case was! But it all had to be done. Sadly, even a perfect hardware design, perfectly implemented and manufactured (a tall order) will not guarantee a perfectly reliable product because all electronic hardware is subject to electrical noise and physical ageing effects, as well as being vulnerable to damage from various unpredictable causes. Hence the concept of “mean time between failures” (MTBF) commonly used to quantify risk.

With software, we are better placed. For the moment, I will assert, without justification, that it is theoretically possible to design and build a perfect software system. For a non-trivial system, this is not easy, and defects are common, as we know. But these defects are all down to human error. To say such things as “bugs are inevitable”, as if these are malevolent creatures which sneak into our software at night and chew at our code, is disingenuous. We can rise above this. We can and should strive for perfection, while nevertheless acknowledging that getting it right first time might take more time than we initially have available. To have any reasonable chance of achieving perfection, we must, as a matter of course, using all the skills, time and resources we can summon, apply the worst-case principle to our designs. If we do not, we are designing for failure.

This series of articles will be continued. Your contributions, by way of comments, are welcome at any stage.

Categories: Software Design Tags: ,
  1. Dan
    August 26th, 2011 at 17:26 | #1

    Peter,

    Hear, hear! I wish more of our peers would take software quality seriously. Over the years, I’ve learned a lot by working with people who design firmwarethat is life-critical – peoples’ lives are saved when it works, and people die when it malfunctions. I try to apply the same principles of every system I work on, even if lives aren’t at stake.

    I’ve found that shops that “test bugs out of the system” tend to have low quality. By “test bugs out of the system”, I mean using the test phase to find bugs that they pretty much “know” are in there, because of deficiencies in specification / design / implementation phases. In other words, they enter the test phase *expecting* to find bugs. If the test phase was treated more like a verification phase, the expected outcome would be different.

    In probably 3 or 4 opportunities in my career, I’ve encountered someone who acknowledged a design deficiency, but basically said “unlikely to happen, and too painful to design in prevention.” 2 of the cases I can recall were race conditions. In one case, the window was approximately 20 nsec, and the engineer said something to the effect “It’s so unlikely to happen, I’m not going to go to the trouble to prevent it.” I said to him, “If this firmware was in a product that your mother was using, and her life depended on it, would you have that same attitude?” Naturally, he decided it actually *was* worthwhile…

    One other thing – you mentioned your early career as an electronics engineer & the worst case. About 3 years ago I was brought in to find & fix a very strange problem… customer thought it was firmware, in fact the hardware manager said (literally) “There is absolutely no possibility that this is a hardware problem.” Turns out it was hardware. Complicated circuit with many resistors, 10% tolerance parts used on the board, and guess what? The tolerances lined up in the circuit (almost +10% here, almost -10% there) such that readings & behavior were intermittently erratic. Original designer assumed a 470K resistor meant a 470K resistor, not possibly a 430K resistor.

    Last thing – and I don’t mean to steal your thunder, since you’re going to be writing a series, but in addition to good design & implementation, I think process can have a huge beneficial impact on quality. I’m not a process fanboy, I’m in the “just enough to make sense” camp, but coding standards, code reviews, static analysis, version control, regression testing, etc. can all have huge impacts on the product’s shipping quality. Seems silly to even have to mention these activities, they should almost be a foregone conclusion, but sadly I can say that is *definitely* not the case.

  2. Peter Bushell
    August 26th, 2011 at 18:03 | #2

    Thank you for your comment, Dan. I’m not a great fan of procedures, either, and part of keeping those we do need (and we do need some!) as servants rather than masters is to build in more quality by design, rather than engaging in futile attempts to beat it in later via a test harness!

  3. Susan
    August 30th, 2011 at 22:06 | #3

    It is so refreshing to read this article and the following comment by Dan. I heartily agree with you both and it is nice to know there are engineers out there like the 2 of you.

  4. Dave Banham
    August 31st, 2011 at 12:12 | #4

    I totally agree!

    In my experience, the one thing more than any other that contributes to poor quality is when management insist on development time scales that are too short and thereby result in the job being rushed and botched through. The very fact that we are using software in an embedded system means that the problem is more complex than can easily or cost effectively be solved by hardware alone. Therefore this complexity needs a commensurate amount of engineering time! The only way that I can see to deal with this problem is the practise of software estimation, even though this is just as difficult as Peter’s experience with high reliability hardware design and often just as tedious. However, without it the software engineering team has no stick with which to bat back the big stick of management!

  5. Peter Bushell
    August 31st, 2011 at 14:05 | #5

    Thanks to all!

    Dave (and others), you might find this post interesting, though my comment underneath it is more relevant to your point about timescales:

    http://embeddedgurus.com/stack-overflow/2011/08/rabbit-patches-and-embedded-systems/

  6. Ian Johns
    August 31st, 2011 at 16:34 | #6

    I agree with the sentiment that companys’ rush-to-market will be the death of engineers and customers.

  7. May 31st, 2013 at 11:03 | #7

    Peter, have just found this posting – agree with you 100%. But having spent many years promoting the virtues of quality software engineering, all I can say is ‘I failed’. Sadly, dismal but true.

  8. Peter Bushell
    June 11th, 2013 at 10:36 | #8

    I accidentally trashed a relevant comment from Juha Aaltonen. Sorry, Juha! However, I was able to retrieve it from the notification email:

    Too hard for the today’s engineers. The push for a lot of cheap labor has paid off.
    To get less capable people graduated (and better results in the P.O.V of university administrations) the level must have been dropped.

    “There are no crimes if even murder is legal”.

  1. September 8th, 2011 at 18:26 | #1