[nSLUG] ethernet and the modern linux
budman85 at eastlink.ca
Fri Sep 26 09:38:35 ADT 2008
George N. White III wrote:
> On Fri, Sep 26, 2008 at 12:42 AM, Mike Spencer <mspencer at tallships.ca> wrote:
> 1. autonegotiation considered harmful
> a) it doesn't always work on system boot (e.g., autoneg
> failed...using half duplex)
> b) even if it worked at boot, systems sometimes do autoneg while
> running, and
> things break (tcp connections)
> In stable environment where you want systems to run 7/24, it is
> probably better to
> used fixed configurations.
This is usually the culprit in most setups I've experienced.
Every network card can be different, Intel cards are really flaky at this.
3COM cards were very solid and never had an issue.
Sometimes it might be the switch your connecting to.
You may be able to turn autoneg off and force 100 from the switch
config, if possible.
Otherwise, many card modules have their own option names to disable
autoneg, or force 100/HD.
> 2. udev considered harmful
> In the good old days with unix, you could force the settings by
> building a custom kernel.
> With linux, network cards are generally supported using modules.
> Until recently, you could pass
> options to control the settings when the module was loaded. You can
> also use ethtool to
> make changes after the module is loaded, but these changes may be lost
> after a network
> "event". I'm not sure what causes the "events". On SGI IRIX I see
> random "ethernet
> cable is disconnected" type messages. They occur a different times
> for different machines
> connected to the same switch, so may not be associated with spanning
> tree reconfiguration.
> Modern linux systems use udev and network manager to dynamically bring
> interfaces up and
> down. This make sense for a laptop that moves around, but not for
> systems that never move.
> If the permanent connections really were permanent,
> NetworkManager+udev wouldn't matter, but if the system things the
> cable is unplugged, it seems to restart the interface. If you rely
> on some
> boot-time script to turn off autoneg, then this may result in the
> interface going back to the
> default configuration with autoneg that fails.
I'll dig into it more, but I'm sure that udev is only used for device
like eth0 or eth1. A quick grep in /etc/udev/rules.d shows only ethX
Check the driver options if you have the kernel sources available.
Otherwise do a search on the driver you are using to see available options.
Apply these settings in modprobe.conf in the newer kernels, and
modules.conf in older ones.
The Intel cards required extensive testing, but I was able to pass it
the correct value
to make it 100/HD no negotiation. I passed the changes to other boxes
and that seem to fixed their issues. However, I think how a switch
saturation may be a culprit as well. We still experienced drops from
time to time, but
these were network disconnects, usually when the switch was heavily loaded.
Updating to a newer driver version for Intel seemed to fix somethings.
Changing to a 3COM card, however, eliminated the autoneg drops,
so I really think it's Intel driver/card issue.
> 3. full duplex considered harmful
> One view is that a network where systems think the cable is
> disconnected is broken and should
> be fixed. My view is that shit happens, and we should configure
> systems to be robust. Apple,
> who have lots of experience with a tightly controlled list of network
> interfaces, recommend settings switches to force half-duplex, which is
> what you get when autoneg fails. Full duplex sounds nice,
> and does make a difference with systems that have traffic patterns
> where send and receive volumes
> are both high, such as a multiuser unix box doing file and mail
> serving, running X11 apps, etc.
> These days, such boxes have multiple gigabit interfaces. I recall
> some discussions where it was claimed that full duplex was a fraud
> since the hardware (this was several years ago) couldn't actually
> handle the traffic volumes for 100Mbit full duplex, and that gigabit
> was best configured for half-duplex.
I remember reading one the spec sheets a while back that explained why
use this setting. It will cause false disconnects, because the
hardware, and network
need to be in pristine condition to make it work. And even at that, the
its not really what you expect, and its more a marketing thing. The
throughput does not double as expected, its only a very small increase.
One the main issues with FD is that hardware must always be synched and
was a note on distance as well. They recommend short runs. I'm
thinking more like
a heartbeat network, a very small network with hardly any outside
FD would probably work and not have to resynch itself.
More information about the nSLUG