<?xml version="1.0" encoding="ISO-8859-1"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ref="http://purl.org/rss/1.0/modules/reference/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://purl.org/rss/1.0/">
	<channel rdf:about="http://www.wsanders.net/rss.rdf">
		<title>WSANDERS.NET</title>
		<link>http://www.wsanders.net/index.php</link>
		<description><![CDATA[]]></description>
		<items>
			<rdf:Seq>
				<rdf:li resource="http://www.wsanders.net/index.php?entry=entry100905-091242" />
				<rdf:li resource="http://www.wsanders.net/index.php?entry=entry100827-093540" />
				<rdf:li resource="http://www.wsanders.net/index.php?entry=entry100707-101614" />
				<rdf:li resource="http://www.wsanders.net/index.php?entry=entry100529-151704" />
				<rdf:li resource="http://www.wsanders.net/index.php?entry=entry100517-090359" />
				<rdf:li resource="http://www.wsanders.net/index.php?entry=entry100423-131202" />
				<rdf:li resource="http://www.wsanders.net/index.php?entry=entry100402-140958" />
				<rdf:li resource="http://www.wsanders.net/index.php?entry=entry100202-150341" />
				<rdf:li resource="http://www.wsanders.net/index.php?entry=entry100118-104412" />
				<rdf:li resource="http://www.wsanders.net/index.php?entry=entry091115-125143" />
			</rdf:Seq>
		</items>
	</channel>
	<item rdf:about="http://www.wsanders.net/index.php?entry=entry100905-091242">
		<title>Amanda and Me: A Shotgun Wedding</title>
		<link>http://www.wsanders.net/index.php?entry=entry100905-091242</link>
		<description><![CDATA[I am going to have to get really involved with Amanda. We bought some Zmanda licenses for our Sun X4540, which has a smallish Ultrium IV tape library hung off it, and have made some feeble attempts to back up the 20 TB of data on the Sun to tape.<br /><br />It&#039;s been a big mess. The Zmanda GUI is nothing special, but it&#039;s cheap and gets you started. If you have a kiddie SAN with only a few TB it will probably work out of the box. But Amanda does not scale up. Once you start spanning tapes and trying to track the state of 10+ million files it requires significant amounts of tuning to work, especially with regard to MySQL and getting it fast enough to support Ultrium IV without shoeshining, and you need to know the Amanda shell commands well.<br /><br />It&#039;s still better than dealing with the incompetent bureaucrats at Symantec Netbackup support, our previous backup software, that&#039;s $50K down a black hole. With Amanda the the support is only as good as we users can make it. I haven&#039;t seen any other good tips online about how to make Amanda scale up. So as I figure it, I&#039;ll post some tips here. ]]></description>
	</item>
	<item rdf:about="http://www.wsanders.net/index.php?entry=entry100827-093540">
		<title>Down with Sun, Up with Dell</title>
		<link>http://www.wsanders.net/index.php?entry=entry100827-093540</link>
		<description><![CDATA[The Dell R510 is now promoted to Best Box Ever, and the Sun X4540 is demoted to also-ran. Sun is now subsumed into a huge corporate entity that will not return your phone call unless you are one of the Fortune 500, and Dell finally has a PowerEdge with easy out of band management and an onboard RAID that is actually faster than the disks you can slap on it.<br /><br /><a href="javascript:openpopup('images/mrtgmailyear.png',500,135,false);"><img src="images/mrtgmailyear.png" width="200" height="54" border="0" alt="" id="img_float_right" /></a>I wish I had saved some MRTG graphs, but the R510 has now replaced a decrepit 2-CPU, SATA based generic shitbox as the sole MTA and MUA for close to 10,000 users. Disk IO is 10X faster (the 2x12-core CPU probably 100X). The old box used to spike to load averages well above 100 whenever the Monday morning newsletter got sent out to all 10,000 recipients, or when some hapless user forwarded their entire inbox to Hotmail. <br /><br />No more. I have yet to see the load average spike above 3. Flat-line. Everyone gets their email a few seconds after it&#039;s sent. Best Box Ever.<br /><br />By the way we upgraded the MTA/MUA software, CommuniGate Pro, at the same time. If you have the bucks, buy it. It&#039;s isn&#039;t a nightmare to install and configure, like Sendmail or Postfix; support is excellent; it has a web browser interface for users too inept to install Thunderbird; and, unlike Exchange, is standards compliant and doesn&#039;t need a $5000 war chest of tools for backup and administration. ]]></description>
	</item>
	<item rdf:about="http://www.wsanders.net/index.php?entry=entry100707-101614">
		<title>Second Best Box Ever: Dell R510</title>
		<link>http://www.wsanders.net/index.php?entry=entry100707-101614</link>
		<description><![CDATA[Still doomed to have all-local storage on my hosts, I desperately needed something new to host email services for 5000 people. It&#039;s a worst-case scenario - 6 million tiny files. We try to spread it out over as many filesystems as possible. The old box has an oldish OS, three XFS filesystems, and is at 100% iowait a lot of the time.<br /><br />I selected a Dell R510 since Sun is basically out of business (all the sales people seem to have been sacked, and Oracle doesn&#039;t seem to have realized yet that Sun made computers.) I selected Xeon 5650 processors to take advantage of a 1.3Ghz bus, and got the box fully loaded with 14 disks. The disks have been configured as 7 RAID1 devices.<br /><br />The proof is in the numbers: Here is some iostat output while testing the (ext3) filesystems by rsyncing one filesystem with 2 million million files to 2 other filesystems:<br /><pre>
avg-cpu:  %user   %nice    %sys %iowait   %idle
           3.42    0.00    2.64    6.63   87.32

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb1       153.07 137.53 1915.09  7.70 180466.57 1161.82 90233.28   580.91    94.46     1.76    0.92   0.36  69.68
sdc1         0.00 11401.15  3.05 253.77   23.99 93244.58    11.99 46622.29   363.16    60.19  234.41   1.69  43.51
sdd1         0.00 10175.01  1.10 299.80    8.80 83828.49     4.40 41914.24   278.62    46.66  153.69   1.57  47.26
</pre><br /><br />The formatting is a mess, but basically I&#039;m getting 1800+ read iops and 500+ write iops per second through the R510 H700, and it&#039;s still loafing. In addition, in each generation of PowerEdges the out of band (iDRAC) management and server monitoring tools have gotten a little better, until they are finally easy to set up. Not too bad.]]></description>
	</item>
	<item rdf:about="http://www.wsanders.net/index.php?entry=entry100529-151704">
		<title>Annoying Bug of the Week: Nagios is going insane!</title>
		<link>http://www.wsanders.net/index.php?entry=entry100529-151704</link>
		<description><![CDATA[<img src="images/caligari.jpg" width="113" height="101" border="0" alt="" id="img_float_right" />Is anyone else having this problem with Nagios 3.2.1 (the current version)? It seems to go insane from time to time, and when I look, ownerships are messed up on nagios.cmd and config files, and nagios.cmd is occasionally transmogrified from a named pipe to a plain file (with the wrong ownership as well.) <br /><br />This causes Nagios to basically go insane, plugins don&#039;t report back, and my active checks all break, and everyone gets paged for no reason.<br /><br />I&#039;ve seen some comments that imply SELinux might somehow be responsible, and only on Red Hat / Centos. I&#039;m running SELinux in &quot;permissive&quot; mode but I might as well get rid of it altogether. I&#039;ll report back if anything (doesn&#039;t) happen.]]></description>
	</item>
	<item rdf:about="http://www.wsanders.net/index.php?entry=entry100517-090359">
		<title>Who needs all that fancy web stuff? Harness the Mighty Power of Whiptail!</title>
		<link>http://www.wsanders.net/index.php?entry=entry100517-090359</link>
		<description><![CDATA[<img src="images/bleah.gif" width="190" height="174" border="0" alt="" id="img_float_left" /> Until I started this job, I didn&#039;t know about Whiptail. No link to project page here - it doesn&#039;t seem to exist as an Open Source project anywhere, but it comes with most Linux distros. <br /><br />This app uses curses to pop up dialogs, forms, and lists in a terminal. How come I didn&#039;t know about it until recently? I wouldn&#039;t have had to learn all that fancy web stuff.]]></description>
	</item>
	<item rdf:about="http://www.wsanders.net/index.php?entry=entry100423-131202">
		<title>Compiling mpt-status for CentOS on a Sun x4100</title>
		<link>http://www.wsanders.net/index.php?entry=entry100423-131202</link>
		<description><![CDATA[I needed a new server, and management found me an X4100 at a garage sale. Not a bad server, but it&#039;s been EOLed by Sun, and Sun either never shipped mpt tools with the box, dropped support, or they got tossed in the dumpster when Oracle moved in.<br /><br />Anyway, you will probably want to monitor your LSI MPT raid if you find one, so here&#039;s how to do it if your distro does not come with the &quot;mpt-status&quot; command:<br /><br />- Obtain mpt-status from <a href="http://freshmeat.net/projects/mptstatus/" target="_blank" >http://freshmeat.net/projects/mptstatus/</a><br /><br />- Obtain the X4100 resource CD from Sun. You may have to pay for this. Hopefully you got one with your box. I have an ISO file called X4100_X4200_ResourceCD_4. <br /><br />- Install the mpt driver from the RPMs on the CD: mptlinux-4.00.05.00-1-rhel5.x86_64.rpm<br /><br />- Activate the mptctl driver (your distro should have come with mptbase and mpt sas): &quot;/etc/rc3.d/S99fusion.mptctl start&quot;. Set up an rc3.d link to start this driver on boot!<br /><br />- You should see mptctl, mptsas, mptscsih (maybe), and mptbase in the output of lsmod at this point. If not, keep hunting for drivers.<br /><br />- Also on the Sun CDROM is mptlinux-4.00.05.00-src.tar.gz. Create the directory and extract this source into /tmp/mptlinux-4.00.05.00-src. <br /><br />- Extract the mpt-status source into /tmp/mpt-status-1.2.0.<br /><br />- Edit the Makefile with:<pre><br />KERNEL_PATH     := /usr/src/kernels/2.6.18-164.15.1.el5-x86_64/include<br />CFLAGS          := -Iincl -Wall -W -O2 \<br />                        -I${KERNEL_PATH} \<br />                        -I/tmp/mptlinux-4.00.05.00-src/message/fusion</pre><br /><br />- Make and - it works!<pre><br /># ./mpt-status -i 2<br />ioc0 vol_id 2 type IM, 2 phy, 67 GB, state OPTIMAL, flags ENABLED<br />ioc0 phy 1 scsi_id 4 SEAGATE  ST973401LSUN72G  0556, 68 GB, state ONLINE, flags NONE<br />ioc0 phy 0 scsi_id 3 SEAGATE  ST973401LSUN72G  0556, 68 GB, state ONLINE, flags NONE<br /></pre><br /><br />]]></description>
	</item>
	<item rdf:about="http://www.wsanders.net/index.php?entry=entry100402-140958">
		<title>Back to Backups</title>
		<link>http://www.wsanders.net/index.php?entry=entry100402-140958</link>
		<description><![CDATA[<img src="images/bucket.jpg" width="96" height="128" border="0" alt="" id="img_float_right" /> <br /><br />Once again I am redoing our failed Symantec-Veritas Netbackup installation. There are a few things I&#039;d rather be doing instead, like anything else, but we&#039;re going with ZManda this time, so it should be a less painful job this time. ]]></description>
	</item>
	<item rdf:about="http://www.wsanders.net/index.php?entry=entry100202-150341">
		<title>Onward and Upward</title>
		<link>http://www.wsanders.net/index.php?entry=entry100202-150341</link>
		<description><![CDATA[The Juniper project is back on the rails. The DHCP problem seems to be under control, through a combination of reducing DHCP lease times (to hours instead of days), and disabling ICMP blocking in Windows Firewall. It wasn&#039;t my decision to block ICMP. Sometimes you can be too paranoid for your own good. For example, ICMP is used to negotiate MTU sizes between disparate networks. I can tell you a story about a major website that blocked all ICMP and wasn&#039;t able to communicate with anyone running smaller-than-normal MTUs. Which is a lot of people. ]]></description>
	</item>
	<item rdf:about="http://www.wsanders.net/index.php?entry=entry100118-104412">
		<title>Happy New Year</title>
		<link>http://www.wsanders.net/index.php?entry=entry100118-104412</link>
		<description><![CDATA[Not much news from the field. We&#039;ve stopped rolling out Junipers for a while because of massive FAIL in the JunOS DHCP server. Actually, it serves us right, trying to use the switches as DHCP servers. Serves up double-right, for this now-seems-silly idea of assigning one routable subnet to each switch port, a-la service provider. Our end users do have a propensity to hang strings of cheap-ass STP-incapable wall-wart-powered hubs off their drops and then &quot;store&quot; patch cables by plugging both ends into one of the hubs, but modern switches have broadcast controls that will effectively allow only the deserving to have their service hosed in this manner. (When I started working here, it was different. Campus-wide outages from looped ports occurred nearly every other day. But my predecessors had disabled spanning tree everywhere and never enabled broadcast controls for some reason I can&#039;t fathom.)<br /><br />Anyway, back to DHCP. JunOS just could not handle it. It turned out to be a mix of our fault and theirs. First, in some buildings but not all, the PCs have Windows Firewall blocking ICMP. This always encourages DHCP fail since hosts (clients and server) can&#039;t ping each other to see if an address is claimed. Second, JunOS was making a horrible mess of the leases database. Third, we made it worse by specifying week-long lease times. Fourth, the JunOS dhcpd would just dump core form time to time.<br /><br />Well, after setting lease times short, disabling Windows Firewall, and upgrading to the latest JunOS, we about ready to start more rollouts. Cross our fingers.]]></description>
	</item>
	<item rdf:about="http://www.wsanders.net/index.php?entry=entry091115-125143">
		<title>UPDATE: Sun X4540: Best. Box. Ever? Maybe.</title>
		<link>http://www.wsanders.net/index.php?entry=entry091115-125143</link>
		<description><![CDATA[<img src="images/x4540.jpeg" width="200" height="191" border="0" alt="" id="img_float_right" /> The X4540 was brought to a standstill a few weeks ago by one dead SATA disk. The box didn&#039;t hang, but any ZFS IO did. Didn&#039;t lose any data, and it might be buggy hardware and drivers, but still, Sun support had no explanation. That should not happen.<br /><br />Eventually, we&#039;re going to give Symantec Netbackup the finger and move to Amanda, which will enable us to upgrade to OpenSolaris. I posted on Slashdot about this and got a reply from &quot;greg1104&quot;:<br /><br />&quot;People need to understand that SATA disks and chipsets are fundamentally weak at error reporting and recovery. There&#039;s only so much you can do about that at the driver or OS level if a problem drives the chipset crazy. You really need hardware optimized for that purpose, like a mature and battle-tested RAID controller.&quot;<br /><br />I agree 100%. For now, ZFS is worth the risk. The box is a virtual tape library, so 100% uptime is not a requirement. I&#039;m not going to start shorting the stock of midrange storage companies just yet.]]></description>
	</item>
</rdf:RDF>
