[nSLUG] linux home or workplace automation and Universal Powerline Bus

George N. White III gnwiii at gmail.com
Sat Sep 20 18:17:56 ADT 2008


On Sat, Sep 20, 2008 at 4:48 PM, D G Teed <donald.teed at gmail.com> wrote:
> On Sat, Sep 20, 2008 at 2:52 PM, George N. White III <gnwiii at gmail.com> wrote:
>
>> What I want to do:
>>
>> 1.  detect loss of line power (e.g., UPS goes to battery)
>
> Does your UPS work with snmp queries?  If yours
> works with a standard UPS MIB, you can monitor
> it with nagios or make your own script to initiate
> actions when the UPS shows a certain number of
> minutes of battery power have elapsed.  I can
> send you the query we use if your UPS is
> working with the standard UPS MIB format.

We have a bunch of Ferrup UPS's that use
serial ports, unfortunately, with a different
protocol for each model.  I've been hacking
the nut drivers to tweak them for each model.

> If you have a APC UPS, there is apcupsd.  It has
> multiple platform client support.  Even if you don't
> have an APC UPS, you can buy a cheap one,
> plug nothing into it, and merely use it to monitor
> the power situation from one admin server.
> Client apcupsd can be set up to shutdown
> at their own trigger points (typically, number of
> minutes on battery power, but the script
> could be changed to work based on UPS temperature).

We do have APC units, also with serial connections.
I did try apcupsd, using the network to control the
other systems, but it didn't work -- systems were
shutting down due to network outages while the
UPS still had power -- maybe the APC protocol
sends a power out event, then the power comes
back but the network drops the notification so the
client shuts down after 5 mins.

>> 2.  with a short period (1-2 mins), initiate shutdown of the
>>     lower priority systems -- a) they don't get generator
>>     power so will have to be shutdown, and b) to reduce
>>     the heat going into the room
>> 3.  if the generator comes on, start the A/C after the
>>     specified delay, otherwise, initiate shutdown of the
>>     remaining systems so they can do a clean shutdown
>>     before overtemp is triggered or the UPS dies.
>
> The temperature of the internal of the UPS can be
> obtained by apcupsd.  In my experience, it is representative
> of the server room temperature in small UPS units.  In
> large UPS units, the temperature will soar higher
> than the room temperature when it runs on batteries.
> This is another advantage of the small APC UPS for
> monitoring.

Yes, I have been considering using newer APC units
with USB because they would be supporting newer
hardware that lacks serial ports.

> You can monitor temperature by crons and
> use it to trigger shutdowns to systems which
> should stay up as long as possible.  Alternately or also,
> trigger nagios alerts to your pager in the event
> things are melting.
>
>> Each of these events needs to be logged locally and
>> somewhere that is accessible from outside the
>> building (there have been a number of cases where
>> the building was closed and I had to wait to get back
>> in to access the damage -- for various reasons, we
>> have to evacuate the building when the power is off
>> for any extended period).   Many sites so this sort of
>> control using ethernet wiring, but using the power
>> line has some advantages since you need power
>> for anything to work, while ethernet tends to degrade
>> when there are power problems:
>
> Nagios and cacti both work well with snmp based
> monitoring and/or alerts.  Otherwise, regular file logging
> or emails from your cron scripts can help.
>
> If you notice network problems when power is lost,
> I'd think there is something not on UPS power.  Another
> possibility is that some devices or systems obtain
> IP addresses by DHCP, and if for some reason that
> fails, it can trigger a series of service or routing
> failures.  Hard wiring all IPs - perhaps in devices
> you tend not to think about - can reduce
> reliance on DHCP.

We have switched ethernet.   I've been arguing with the
network managers in Ottawa that we should move
to fixed IP's to reduce problems with network outages,
but they are pushing back to get rid of fixed IP's.

> For example, we initally had false nagios
> alerts related to our UPS power being out
> merely because the DHCP was down for an
> hour and then the UPS ethernet was offline.
> I hard wired the IP into the UPS and never
> had a false alarm afterward.

Yes.  In my case, I think the DHCP server is reliable,
but can't handle situations where 800 PC's are rebooting
at the same time, and the DNS server also gets
hit hard.

>> 1.  there have been UPS failures in the network
>> closets.   We have had power outages that
>> killed UPS's  -- you can get really big spikes
>> when high voltage wires touch the lower
>> voltage lines.
>
> I don't know what can help protect you from this.
> Talk to an electrician?

We have talked to electricians and even engineers, but
when you run 7/24 you really push the limits of conventional
technologies.   It would cost $80,000 to go to carrier-grade
protection for our small machine room.

> It sounds like you are interested in the UPB path
> because it won't involve ethernet to monitor
> the systems and control the shutdown.  But
> ethernet problems are not really a "given"
> when the power goes out.  This issue is
> likely solvable with analysis.  If you are not
> using cacti or nagios already, it would
> be an idea to set them up to send alerts
> and/or graphically log the availability of hosts,
> router ports, etc.  Then the next time you
> lose power it might be possible to trace
> the thing that goes down when the lights
> go out.

We have unexplained network outages way too
often.  The system logs have 1-2 "ethernet cable
unplugged" events a week, sometimes followed
by reconnecting with half-duplex, so things bog
down until I run "sudo ethtool eth0 autoneg off
duplex full speed 100".   We have been setting
the ports on the switch to full-100, but Apple
wants us to change that to half-100 because
that is the the default when autonegotiation
fails.   The Apple imac with one port hasn't
skipped a beat since it was plugged in last June,
but the mac pro with 2 interfaces has been
a horror.   We are only using one port -- it
appears that the port gets blacklisted when
a problem occurs (to force traffic onto the other
port) but once that happens you have to erase
the configuration and enter it again before it
will be used.   Again, Apple expects us to use
both ports in "bonded" mode, but the upstream
connections can't handle more than 100 M bits,
so the only reason for using both ports is to
increase reliability.  I think the sales guy told
the bosses they were buying a reliable network,
so the idea that the configuration should be
changed to improve reliability is hard to sell,
particularly as I think the problems have been
due to UPS failures or from many 100's of Windows
PC's rebooting all at once.

I did think about setting up a machine room net just
for monitoring and control, but the power wires are
already there and have to be there, while extra
network cables and jacks will always be a source
of confusion (figuring out how to configure the 2nd
port on a new system, making sure the control net
remains free of data traffic when it would be so easy
to get around a bottleneck between two systems
by sending the traffic over the control net, etc.


-- 
George N. White III <aa056 at chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia



More information about the nSLUG mailing list