[nSLUG] XEN & Heartbeat

Ian Campbell ian at slu.ms
Thu Jun 4 04:02:55 ADT 2009

On Wed, Jun 03, 2009 at 04:59:59PM -0700, Michael Crawford wrote:
> On Wed, Jun 3, 2009 at 4:27 PM, Hatem Nassrat <hnassrat at gmail.com> wrote:
> > I have briefly looked at fault tolerance during my MACS at dalhousie
> > and I remember reading about systems which can even handle hardware
> > failures by having standby replicas of the hardware within the machine
> > itself.
> While it's quite likely an urban legend, it was said that one could
> fire a shotgun through a Tandem box without it going down.
> Tandem specialized in redundant, fault-tolerant hardware.  But they
> were also very expensive, so I don't think they're around anymore.

In theory you can do that with a lot of the $$$$ Sun hardware.

For anyone still at Dal, torch/flame fits that bill. Their 'medium
iron' gear (and above) is meant to survive significant hardware
failures... in theory you can happily hotswap ram, cpus, etc. etc.
without any issue... although in practice you'll likely find yourself
restarting for a kernel patch or something.

That's not standyby though, that's routing around damage.

Anything running on the affected processor/ram will crash of course,
but the machine should stay up.

