[nSLUG] linux home or workplace automation and Universal Powerline Bus

Daniel Morrison draker at gmail.com
Sun Sep 21 15:20:25 ADT 2008


2008/9/21 George N. White III <gnwiii at gmail.com>:
> On Sat, Sep 20, 2008 at 9:15 PM, Daniel Morrison <draker at gmail.com> wrote:

> One of the problems with setting ports to full-100 was that equipment
> moves around a lot and the windows drivers for the most common PC
> interface (Intel) didn't work with fixed settings until last January's update.

Vendor bugs... <sigh>
All the more reason to use auto-neg!

> This is what is supposed to be done, but sometimes the PC still ends up
> set for half-duplex, and sometimes the switches get set for autonegotiation.
> I suspect the fixed settings may be lost after updates or maybe just reboots
> of the switches.

If the settings are changing after they have been 'set', the only thing I
can assume is that they have not been set properly.  On Cisco switches
it's important to 'write' after making a change, so that it's not lost on
reboot.  (Or maybe another sysadmin's configuration practices are
overwriting your changes??)

>> I think the best option is to set auto-negotiation on everywhere, unless
>> there is a specific bug with some vendor's equipment, in which case set
>> the best rate at both ends, and document it!
>
> I think that is true if you have lots of systems (WIn XP) that don't handle
> fixed settings, but in principle fixed settings everywhere should
> be more reliable.

Using fixed settings everywhere would certainly be more reliable than
auto-negotiation everywhere -- except that it is very difficult for staff
to keep track of and configure every port correctly, _especially_ when
equipment moves around.  In this situation, unless there is strict
adherence to policy and documentation, it very quickly changes from 'more
reliable' to 'shot in the dark'.

> We tried that, but in practice we keep finding
> duplex mis-matches, so Apple may be right that if we used fixed
> it has to be half-duplex.

??? If the equipment can do full, there's no reason not to use it.  If
Apple's default after failed auto-negotiation is half, then they're asking
you to set to half on the switch so that new equipment will work "out of
the box".  But unless their 'forced mode' settings are faulty, why not use
full duplex?  Again, if everything is just set to auto-negotiate it may
work out better.

> Within the past month I have caught two different
> linux distros and Mac OSX (all using Intel interfaces) connected at
> half-duplex despite having been configured for full.

Don't know what to tell you.  I don't trust any distributions 'automatic'
method for configuring an interface... all this non-standard
/etc/sysconfig/network/ garbage and weird 'ethtool' programs that do who
knows what.  If there's any doubt, get mii-diag and configure it manually
early in the boot (and maybe again late in the boot, if you can't disable
the distro's own broken configuration).  mii-diag is at
   ftp://ftp.scyld.com/pub/diag/
Compile commands are at the end of mii-diag.c.  You'll also want libmii.c
for full functionality.

> With SGI we build a custom kernel with fixed-100 set, and never find
> them using half, so it looks like using ethtool is not enough --
> something that happens on the switch must be causing the interface to
> reset to half duplex.

Sorry that I'm coming on a bit strong here, it just seems to me that weird
things are happening to you which should not happen.  If I understand what
you just wrote above: SGI is forced to 100-full in the kernel, and always
works.  Linux boxes are forced with ethtool, and get reset somehow to
half-duplex.

This does not suggest to me that the problem is on the switch.  The
problem is on the Linux box!

Do you use dhcp?  Could dhcpcd be resetting the interface?  Maybe add an
ethtool or mii-diag command to force the interface in the dhcpcd if-up
script.

Unlike on a Cisco switch, where the config is saved with 'write', AFAIK
there is no way to permanently set the mode on Linux (although compiling
out any other modes from the kernel driver as you've done on SGI is a neat
trick!) (also, some NICs have a configurable 'default' that can be written
to NVRAM.  Don't know which ones though...).  Anyway, you're dependant on
a run-time configuration program to set the mode for you.  Any power loss
to the interface could result in it reverting to it's non-forced default
-- which is often auto-negotiate.  But if the switch is forced, it's a
crap shoot again.

Do machines every hibernate or sleep?  That might power cycle the
interface, but fail to re-run ethtool when it wakes up.

> Maybe if the switch configuration is updated the port goes
> to auto-neg temporarily and then back to the fixed setting (when
> we check, it is always fixed).

Seems unlikely... guess it would depend by switch manufacturer though.
Even if it did... this wouldn't affect the other end it's it properly
forced!

> What would this do for other machines on that switch -- could it explain

If the loop is entirely within one switch, it (crosses fingers) shouldn't
cause any trouble for other users.  If the loop involves two or more
switches, then all their uplink connections are part of the loop, so
traffic between switches maybe momentarily suspended.  But there shouldn't
be any 'ethernet disconnected' messages to individual ports, I don't
think.

> "ethernet disconnected" messages?  Does anybody know what causes
> those (when the cable has not been touched, at least on the user end).

- duplex mismatch
- speed mismatch
- bad/flakey NIC
- bad flakey switch port
- bad/flakey patch panel or cable/BIX job/mis-wiring/overlong cable run
- network driver bugs
- network storm
- MAC address conflict (?)

(just my initial ideas).

I wanted to add 'heavy network load' to that list, but it's really an
exacerbating factor.  Many of the above list may not be noticed until a
large amount of data is pushed through the pipe.

>> Or, if for some reason (power glitch?) some intermediate switch or
>> transceiver flickers out for a moment, it may take the devices on either
>> side 30-40 seconds to re-enable the link.
>
> I think that is what we get, plus the load from all those PC's booting at once.

Definitely I would be very concerned about your '800 desktops rebooting at
once' issue.  If there's anyway you could arrange staggered boots, either
by staggering the power, or maybe... each system does a 'sleep
<myroomnumber>/100' early in the boot sequence!

> Thanks.  It certainly gives me some ideas for questions to ask, but in the
> end it will take a lot to make me comfortable with relying on switched
> ethernet for time-critical control functions during power "events".
> STP should increase the chances that the systems that run on generator
> will still have internet connectivity, so you want it, but 30-40 second
> interruptions are not good when trying to get non-critical systems to
> shut down.  In practice, however, some of them will probably be hung on
> some network process and won't shut down until the network comes
> back.

Although I appreciate your concerns, it's still confusing.  The
reliability of switched ethernet may be debatable, but the aspect of
"power events" should have ZERO effect on that debate.  If it does, your
UPS/generator/PDU systems have got issues.  STP should not need to do
anything during power events, unless part of your network equipment is not
on a UPS -- in which case: that's your trouble.

Just my (strongly voiced I realize!) two cents... it may very well be that
I'm uninformed on something important, so feel free to jump in guys! (Who
am I kidding, of course you will).

-D.



More information about the nSLUG mailing list