[nSLUG] linux home or workplace automation and Universal Powerline Bus

George N. White III gnwiii at gmail.com
Sun Sep 21 13:48:59 ADT 2008


On Sat, Sep 20, 2008 at 9:15 PM, Daniel Morrison <draker at gmail.com> wrote:

> 2008/9/20 George N. White III <gnwiii at gmail.com>:
>
>> We have unexplained network outages way too
>> often.  The system logs have 1-2 "ethernet cable
>> unplugged" events a week, sometimes followed
>> by reconnecting with half-duplex, so things bog
>> down until I run "sudo ethtool eth0 autoneg off
>> duplex full speed 100".   We have been setting
>> the ports on the switch to full-100, but Apple
>> wants us to change that to half-100 because
>> that is the the default when autonegotiation
>> fails.
>
> Just a note on this point.  100BaseT can autonegotiate, or it can be
> manually fixed, but not both.  By this I mean that BOTH ENDS must be
> configured the same way: both autonegotiate, or both exlicitly set, e.g.
> 100Mbit full-duplex.

We have been setting both ends to full-100, but Apple wants us to
use half-100.

> If you set one side manually, and leave the other to autonegotiate, then
> the negotiation will fail, and the device will fall back to its default.
> If this happens to match what you set on the other end you're lucky -- if
> it doesn't, you get a duplex mismatch.

One of the problems with setting ports to full-100 was that equipment
moves around a lot and the windows drivers for the most common PC
interface (Intel) didn't work with fixed settings until last January's update.

> So if you do:
>
>   ethtool eth0 autoneg off duplex full speed 100
>
> make sure you do it on the switch side also, and make sure it's committed
> on the switch and added to the saved configuration on the system.

This is what is supposed to be done, but sometimes the PC still ends up
set for half-duplex, and sometimes the switches get set for autonegotiation.
I suspect the fixed settings may be lost after updates or maybe just reboots
of the switches.

> I think the best option is to set auto-negotiation on everywhere, unless
> there is a specific bug with some vendor's equipment, in which case set
> the best rate at both ends, and document it!

I think that is true if you have lots of systems (WIn XP) that don't handle
fixed settings, but in principle fixed settings everywhere should
be more reliable.   We tried that, but in practice we keep finding
duplex mis-matches, so Apple may be right that if we used fixed
it has to be half-duplex.  Within the past month I have caught two different
linux distros and Mac OSX (all using Intel interfaces) connected at
half-duplex despite having been configured for full.  With SGI we
build a custom kernel with fixed-100 set, and never find them using
half, so it looks like using ethtool is not enough -- something that
happens on the switch must be causing the interface to reset to half
duplex.   Maybe if the switch configuration is updated the port goes
to auto-neg temporarily and then back to the fixed setting (when
we check, it is always fixed).

> If you rely on the failed auto-negotiation default, you may find
> unexpected problems as vendors change their 'defaults'.
>
> Another idea: be wary of network loops.  If you have Cisco equipment, then
> STP (spanning tree protocol) is on by default.  This is good, because it
> will avoid taking down your network if you have a loop.  But it's bad
> because it makes each port take 30-40 seconds to come up after it's
> plugged in.

We started out with Cisco, but there was a massive upgrade in April
so some new switches may be from other vendors.


> Should, for example, an employee occasionally, during the working day,
> feel the need to 'plug in' a switch on their desk that actually is already
> plugged in, then there may be a 30-40 second 'pause' in some areas of your
> network as the STP sorts things out.

What would this do for other machines on that switch -- could it explain
"ethernet disconnected" messages?  Does anybody know what causes
those (when the cable has not been touched, at least on the user end).

> Or, if for some reason (power glitch?) some intermediate switch or
> transceiver flickers out for a moment, it may take the devices on either
> side 30-40 seconds to re-enable the link.

I think that is what we get, plus the load from all those PC's booting at once.

> So STP is a bit like auto-negotiate -- best if left on everywhere, but
> reasonable to turn off in particular circumstances, e.g. a special fibre
> link to another location, in which there'll never (did I say never?) be a
> loop, but which might be subject to power flickers (e.g. a repeater
> somewhere).
>
> (STP is also good for failover.  If you have a stack of switches in a
> closet with a backbone structure, make sure STP is enabled on those
> backbone ports, and then loop the top one back to the bottom one.  STP
> will automatically find the optimum configuration, and down one of the
> links.  Now if one switch dies the previously downed link opens up, and
> everyone (except those on the failed switch, of course!) still has
> connectvitiy to everywhere else).
>
> Hope this lends a few thoughts...

Thanks.  It certainly gives me some ideas for questions to ask, but in the
end it will take a lot to make me comfortable with relying on switched
ethernet for time-critical control functions during power "events".
STP should increase the chances that the systems that run on generator
will still have internet connectivity, so you want it, but 30-40 second
interruptions are not good when trying to get non-critical systems to
shut down.  In practice, however, some of them will probably be hung on
some network process and won't shut down until the network comes
back.

-- 
George N. White III <aa056 at chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia



More information about the nSLUG mailing list