[nSLUG] XEN & Heartbeat

Michael Crawford mdcrawford at gmail.com
Wed Jun 3 21:37:15 ADT 2009

On Wed, Jun 3, 2009 at 4:59 PM, Michael Crawford <mdcrawford at gmail.com> wrote:
> On Wed, Jun 3, 2009 at 4:27 PM, Hatem Nassrat <hnassrat at gmail.com> wrote:
>> I have briefly looked at fault tolerance during my MACS at dalhousie
>> and I remember reading about systems which can even handle hardware
>> failures by having standby replicas of the hardware within the machine
>> itself.

This sort of thing is common with computers used for space flight.

My memory is a little hazy about the exact number of computers, but
the Space Shuttle has several that vote on every output result ever
made.  So if one computer fails and should start trying to screw up
the controls, it is overruled by the others.

But all these computers run the same program, so just to make sure
there is also a computer that runs software which was developed
completely independently, but with the same specification.

There remains a problem though: it has been found that, while
difficult and expensive, it is possible to implement fault-tolerant
real-time safety-critical software completely to spec, without any
bugs whatsoever.  What is sometimes discovered with such software is
that the original spec is incorrect.

Michael David Crawford
mdcrawford at gmail dot com

   GoingWare's Bag of Programming Tricks

More information about the nSLUG mailing list