Rants on technology: 2008

Sunday, 21 December 2008

Mytharchive: cannot burn a dvd

I've been playing with MythTV again. Configuring it still is a major pain in the arse - there seems to be something wrong with the channel scanner that throws you off the track. I think.

Anyway, after recording a few programs I wanted to burn a DVD. My first attempt failed because Mytharchive tries writing to /dev/dvd by default. I solved this problem by linking /dev/dvd to my DVD device.

Tried again and Mytharchive refused to enter the burn DVD menu, showing the error below:

Mytharchive: cannot burn a dvd, the last run failed to create a dvd

WTF? A bit harsh on the error flagging, I'd say. So after searching around I found some people with the same problem but none of the helpful people would tell how to clear the error. Here's what you do.

1. Quit mythfrontend
2. Open a command prompt and:

mysql -u root
use mythconverg;
update settings set data='' where value='MythArchiveLastRunStatus';
update settings set data='' where value='MythArchiveLastRunType';

3. Go back to mytharchive and try again. This time it will let you proceed to the DVD burn menus.

Thursday, 11 December 2008

How to avoid overlapping rsync instances

Running rsync as a cron job is probably as popular as sliced bread. For small intervals, however, you risk having a second instance start before the first one has finished, beginning a downward spiral of bandwidth and processes.

To avoid multiple instances getting in each other's way there's a simple solution using flock. Example without lock (may cause overlapping instances):

*/5 * * * * /usr/bin/rsync --delete -a source_server:/source/path/ /dst/path/

Example using flock:

*/5 * * * * flock -xn /tmp/example.lock -c '/usr/bin/rsync --delete -a source_server:/source/path/ /dst/path/'

In the second example, if a rsync instance runs for more than 5 minutes, flock will fail and thus not execute a second rsync process.

This can be used for pretty much any shell problem where locking helps. Have a read on the flock man page for other examples.

Wednesday, 25 June 2008

No time? No worries!

Thursday, 5 June 2008

IIS asp.net v2.0 fake 403 and 404 errors

After installing an ASP.NET v2.0 application I tried to browse to it but IIS wouldn't show the page and spit this 403 error instead:

The website declined to show this webpage

Now, there are a few causes to this problem. The most common is when the user forgets to add an index page to the site or, to add the index page to the list of "Enable default content page" under IIS.

Mr. Ian Tinsley found a more sinister cause that has the following symptom: if you browse to an existing .aspx file, ISS will return a 404 error code, even though the file exists. A good way to check if this is your case is to place a static file (htm or gif etc) in your site and try browsing it; if it shows you the static file then you probably have a fake 403/404 scenario.

To make sure, go to Web Service Extensions under IIS and check if ASP.NET v2.x is listed:

The tricky bit is: even if ASP.NET v2.x is not listed above, the v2.x extension still shows on the application properties. So if you don't have it above, you need to run aspnet_regiis.exe:

Running with -ir will help keep your asp.net v1.1 stuff running and still install v2:

C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\aspnet_regiis.exe -ir

Saturday, 3 May 2008

Sun x4450

With a range of options for the Intel Tigerton Quad-core Xeon processors, the Sun x4450 fitted the recommendations for our new postgres database servers. Solaris 10 was chosen by default as the operating system - why complicate the setup by mixing vendors, right?

Well, long story made short: I've ditched Solaris 10 x86_64 in favour of Red Hat Enterprise 5.1. Why? For one thing the Solaris 10 installation was frustratingly complicated.

Performing an installation via serial console requires you to redirect output from the BNC to the System in the BIOS settings. Then the manual states you should choose grub menu's option "ttb". Of course the manual meant "ttyb". But it is actually a typo AND an error: you must choose ttya or all you'll see after boot is "Sun Solaris 5.10" and nothing else.

Once the serial console installation is finished and the system reboots you have another obstacle. The boot process stops at this error message:

Bad PBR sig

Reboot and Select proper Boot device or
Insert Boot Media in selected Boot device and press a key

Googling around (Sun's KB was useless) I found someone with the same problem reporting that a GUI installation didn't show that error. Very well, I configured the firewall to allow Sun's network KVM:

8890 TCP Remote Console
9000 TCP Remote Console
9001 TCP Remote Console
9002 TCP Remote Console
9003 TCP Remote Console
69 UDP TFTP (for firmware upgrades)
161 UDP SNMP (for monitoring)

As the anonymous poster reported, this did solve the problem. Now to install postgres for x86_64. Hmm. Doesn't run - complains about missing libraries. Probably just need to run crle to set-up the location of the 64bit libraries. Where are they, where are they... WTF? There are no x86_64 libraries installed in my Solaris 10 for x86_64. I check around with `file' and yeah: all bloody system libraries are 32 bits.

I confess I didn't try too hard after this. A day later and we had postgres running on RHEL 5.1. And yes, with a 64bit binary and libraries.

To Sun's credit, the hardware looks a beauty, even though you have to assemble the whole thing by yourself: memory cards, SAS card, Fibre Channel card, hard disks and whatever extras you bought. What a pain in the arse. Compared to the other Dell servers with pre-installed RHEL 5.1 we recently bought the Sun experience is pathetic: the Dells where up on the same day versus 1 week for Sun.

Friday, 18 April 2008

Nagios checks for LSI RAID with MegaCli

I am pretty sure there is a SNMP object in Dell's DRAC 5/PERC to inform about the status of your RAID volumes and physical disks. I'm not too fond of using SNMP with Nagios so I wrote a check script that uses the MegaCli linux i386 binary (from the LSI web site) to report the status of the RAID on our Dell/Red Hat Linux servers.

Here's what you need:

perl interpreter in /usr/bin/perl
nagios' utils.pm in /usr/local/nagios/libexec/
perl module Time::HiRes (cpan; install Time::HiRes)
sudo in /usr/bin/
MegaCli in /opt/MegaRAID/MegaCli/MegaCli64

You'll also need the check_dellperc in /usr/local/nagios/libexec. Change the paths as you see fit.


#!/usr/bin/perl -wT
#
# CHECK DELL/MegaRAID DISK ARRAYS ON LINUX
# $Id: check_dellperc 142 2008-03-17 22:25:46Z thiago $
#

BEGIN {
 $ENV{'PATH'} = '/usr/bin';
 $ENV{'ENV'} = '';
 $ENV{'BASH_ENV'} = '';
 $ENV{'IFS'} = ' ' if ( defined($ENV{'IFS'}) ) ;
}

use strict;
use lib "/usr/local/nagios/libexec";
use utils qw($TIMEOUT %ERRORS &print_revision &support &usage);

use Getopt::Long;

use Time::HiRes qw ( tv_interval gettimeofday );

use vars qw($opt_h $help $opt_V $version);
use vars qw($PROGNAME $SUDO $MEGACLI);

$PROGNAME = "check_dellperc";
$SUDO = "/usr/bin/sudo";
$MEGACLI = "/opt/MegaRAID/MegaCli/MegaCli64";

my $t_start = [gettimeofday];

Getopt::Long::Configure('bundling');
GetOptions
 ("V"   => \$opt_V,  "version"    => \$opt_V,
  "h"   => \$opt_h,  "help"       => \$opt_h,
 );


if ( $opt_V ) {    print_revision($PROGNAME, '$Id: check_dellperc 142 2008-03-17 22:25:46Z thiago $');
 exit $ERRORS{'OK'};

} elsif ( $opt_h ) {
 print_help();
 exit $ERRORS{'OK'};
}

my $TIMEOUT = $utils::TIMEOUT;
my $start_time = time();
# TODO: add timeout option#if ( $opt_t && $opt_t =~ /^([0-9]+)$/ ) {
#    $TIMEOUT = $1;
#}


# Check state of Logical Devices
my $status = "PERC OK";
my $perfdata = "";
my $errors = $ERRORS{'OK'};
my $vd = "";
my $vds = "";

open(MROUT, "$SUDO $MEGACLI -LDInfo -Lall -aALL -NoLog|");
if (!<MROUT>) {
 print("Can't run $MEGACLI\n");
 exit $ERRORS{'UNKNOWN'};
}

while (<MROUT>) {
 my $line = $_;
 chomp($line);

 if ($line =~ /^Virtual Disk: (\d+)/) {
     $vd = $1;
     next;
 }

 if ($vd =~ /^[0-9]+$/) {
     if ($line =~ /^State: (\w+)/) {
         $vds = $1;            #TODO: verbose print("State for VD #$vd is $vds\n");
         $perfdata = $perfdata." VD$vd=$vds";
         if ($vds !~ /^Optimal$/) {
             $errors = $ERRORS{'CRITICAL'};
             $status = "RAID ERROR";
             #TODO: verbose print("Error found: $status. Skipping remaining Virtual Drive tests.\n");
             last;
         } else {
             $vd = ""; $vds = "";
         }
     }
 }
}
close(MROUT);

# Check state of Physical Drives
my $count_type;my $pd = "";
my $pds = "";

open(MROUT, "$SUDO $MEGACLI -PDList -aALL -NoLog|");
if (!<MROUT>) {
 print("Can't run $MEGACLI\n");
 exit $ERRORS{'UNKNOWN'};
}

while (<MROUT>) {
 my $line = $_;
 chomp($line);

 if ($line =~ /^Device Id: (\d+)/) {
     $pd = $1;
     next;
 }

 if ($pd =~ /^[0-9]+$/) {
     if ($line =~ /^(Media Error|Other Error|Predictive Failure) Count: (\w+)/) {
         $count_type = $1;
         $pds = $2;            #TODO: verbose print("$count_type count for device id #$pd is $pds\n");
         $perfdata = $perfdata." PD$pd=$count_type;$pds";
         if ($pds != 0) {
             if ($errors == $ERRORS{'OK'}) {
                 $status = "DISK ERROR";
                 $errors = $ERRORS{'WARNING'};
             }
         }
     }
 }
}
close(MROUT);

# Got here OK
#
my $t_end = [gettimeofday];
print "$status| time=" . (tv_interval $t_start, $t_end) . "$perfdata\n";
exit $errors;


sub print_usage
{
 print "Usage: $PROGNAME\n";
}


sub print_help
{
 print_revision($PROGNAME, '$Revision: 142 $ ');
 print "Copyright (C) 2007 Westfield Ltd\n\n";
 print "Check Dell/MegaRaid Disk Array plugin for Nagios\n\n";

 print_usage();
 print <<USAGE
-V, --version
    Print program version information
-h, --help
    This help screen


Example:
    $PROGNAME

USAGE
;

}

After installing the script above and changing the paths to match your system, edit your sudoers file (sudo /usr/sbin/visudo) and comment the following line:

# Defaults requiretty

If you are doing NRPE checks, the line above will prevent the script from running sudo because there is no TTY associated with it. There is probably a way around it that doesn't involve disabling this security feature - if you find out please tell me.

While in the sudoers file, also add the following two lines:

nagios ALL=(ALL) NOPASSWD: /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL -NoLog
nagios ALL=(ALL) NOPASSWD: /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL –NoLog

If you run NRPE with a user different than "nagios", change the lines above to match it.

That is it, basically. Before adding it to your NRPE checks, give it a try:

$ sudo -u nagios /usr/local/nagios/libexec/check_dellperc
PERC OK| time=0.189185 VD0=Optimal VD1=Optimal PD0=Media Error;0 PD0=Other Error;0 PD0=Predictive Failure;0 PD1=Media Error;0 PD1=Other Error;0 PD1=Predictive Failure;0 PD2=Media Error;0 PD2=Other Error;0 PD2=Predictive Failure;0 PD3=Media Error;0 PD3=Other Error;0 PD3=Predictive Failure;0 PD4=Media Error;0 PD4=Other Error;0 PD4=Predictive Failure;0 PD5=Media Error;0 PD5=Other Error;0 PD5=Predictive Failure;0

It should return a status of zero (unless, of course, your RAID is b0rken):
$ echo $?
0

Monday, 14 April 2008

It is not a toy

From the book "Things that work but you wouldn't have bet on before actually trying".

$ mplayer your_movie.iso

Thursday, 31 January 2008

Teamsite 6.7.1 SP1 won't start

After installing Teamsite 6.7.1 SP1 on a Solaris 10 machine I tried starting the service to check out the new version. Sadly, the service wouldn't start, dumping a core and printing this message:

[crit] file vhost.c, line 190, assertion "rv == APR_SUCCESS" failed Abort - core dumped /app/teamsite/iw-home/iw-webd/bin/iw.webd start: iwwebd could not be started

In my case, adding "dns" to the "hosts:" line on /etc/nsswitch.conf solved the problem:

hosts: files dns

A little bee tells me that you can also edit /iw-webd/conf/iwwebd.conf.template and change "_default_" on the VirtualHost entry to "*":

<VirtualHost _default_:__IWWEBD_HTTPS_PORT__>

to

<VirtualHost *:__IWWEBD_HTTPS_PORT__>

I didn't try it but that's also the recommendation from an Interwoven's KB article (support account needed).

TCP wrappers: refused connect from ...

I've used inetd + tcp wrappers + netcat a number of times for migration of TCP-based services to a new server. It goes something like this:

Get the service running on the new box
Point the DNS entry (or IP address of the server on clients) to the new server
Stop the service on the old box
Enable the redirection using inetd

For number 4 and HTTP redirection, an entry like the one below in your /etc/inetd.conf is usually enough:

http stream tcp nowait nobody /usr/bin/tcpd /usr/bin/netcat new-server 80

You then leave the old server running until no more clients connect to it. I do that by inspecting the syslog entries and looking for the netcat redirections. Last time, however, I was seeing these:

Jan 30 14:20:04 old-box netcat[16769]: [ID 947420 mail.warning] refused connect from 189.201.77.65

And sure enough, I started to get complaints that some clients were no longer able to connect to the service. I had left /etc/hosts.allow empty on purpose since there was no need to restrict the service to specific hosts.

After some digging through the tcp wrappers readme, I suspected that the version of tcpd on this SunOS 5.8 (Solaris 8) had been compiled with -DPARANOID. If defined, PARANOID will cause tcpd to reject hosts whose IP address don't resolve to a name (using reverse DNS).

I downloaded the tcp_wrappers source, recompiled without -DPARANOID and installed the newly compiled binary. The refused connection entries were gone from the log and the clients confirmed they were able to reach the server once again.

Wednesday, 9 January 2008

MSMQ error:0xc00e0027

After installing windows 2003 Service Pack 2 on a production box I started getting errors 0xc00e0027 when the application tried to access the queues. A number of related articles gave different solutions but what worked for me was reinstalling Message Queuing from the Control Panel.

I had to re-create my queues too, since they were wiped-out when I removed the component.

Tuesday, 8 January 2008

VMWare Workstation on Debian AMD64

I've just upgraded to the x86_64 Debian architecture (AMD64). When installing vmware workstation for x86_64 it complained about some missing libraries but the installation finished nevertheless.

However the application would not open any VMs, spitting this error:

/usr/lib/vmware/bin/vmware-vmx: error while loading shared libraries: libX11.so.6: cannot open shared object file: No such file or directory

Turns out some of the VMWare binaries need 32bit libraries, even on the 64bit version. This post on vmware's knowledge base gave a solution for Red Hat distros. On Debian the solution is analogous: you just need to install the package ia32-libs.

You will need to re-install VMWare so that it regenerates the vmmon kernel module. The vmware-any-any patch is not needed.

Monday, 7 January 2008

Broadcom 4311 on Linux

The Dell Inspiron 1501 I bought after seeing the ad on TV nearly a year ago is a good bang for your buck, except when it comes to the wireless card shipped with this notebook.

Because of the efforts of the ndiswrapper and bcm43xx developers I managed to use the wi-fi card until I started using the AMD64 binaries for Debian Etch.

If you are buying a notebook and plan to use Linux in it, stay away from the Broadcom wireless cards or any other cards that require loading a firmware at run-time.

I got tired of struggling and I'm now waiting for my Intel 2915ABG mini-pci card to arrive in the mail. Hardware is too cheap nowadays to justify wasting my time getting a vendor to work.

Rants on technology