Monday, 26 November 2007

Checking your round-robin DNS with nagios

Nagios comes with a plugin, check_dns, that allows you to perform DNS-based checks. It is really useful to check that your DNS server is responding and, with option switch "-a", that it is providing the expected IP address to specified queries.
$ ./check_dns -H example.com.au -a 1.2.3.4
DNS OK: 0.157 seconds response time. example.com.au returns 1.2.3.4|time=0.157327s;;;0.000000
If your host name has more than one IP address associated with it - no problem -, just add it to the command line. For example:
./check_dns -H example.com.au -a 1.2.3.4,9.8.7.6
DNS OK: 0.157 seconds response time. example.com.au returns 1.2.3.4,9.8.7.6|time=0.157327s;;;0.000000
However, if your host name is using a round-robin DNS configuration you can't predict the response reliably. Try google.com.au, for instance.
$ dig google.com.au
;; ANSWER SECTION:
google.com.au. 230 IN A 72.14.235.104
google.com.au. 230 IN A 72.14.207.104
google.com.au. 230 IN A 72.14.203.104

Then check_dns will only work 1/3 of the time:
./check_dns -H google.com.au -a 72.14.235.104,72.14.207.104,72.14.203.104
DNS OK: 0.035 seconds response time. google.com.au returns 72.14.235.104,72.14.207.104,72.14.203.104|time=0.035388s;;;0.000000
The other 2/3 you will see:
$ ./check_dns -H google.com.au -a 72.14.235.104,72.14.207.104,72.14.203.104
DNS CRITICAL - expected '72.14.235.104,72.14.207.104,72.14.203.104' but got '72.14.203.104,72.14.235.104,72.14.207.104'
I thought that really sucked because it stopped me from using this very nice feature of check_dns. So I patched check_dns.c in Nagios Plugins 1.4.10 to include the command line option "-o". When you specify this option, check_dns will sort the DNS response so you can still use -a.
$ ./check_dns -H google.com.au -o -a 72.14.203.104,72.14.207.104,72.14.235.104
DNS OK: 0.112 seconds response time. google.com.au returns 72.14.203.104,72.14.207.104,72.14.235.104|time=0.111538s;;;0.000000
I've sent the patch to the Nagios developers list - hopefully it will get incorporated into future releases. If not, you can download the patch here and the patched source here.

Wednesday, 21 November 2007

"Bad user" on Solaris 10 crontab

An account used for an application could not run its cron jobs. In /var/cron/log all I could see was:

! bad user (wondapp) Tue Nov 13 03:23:00 2007

I checked /etc/cron.allow (which didn't exist) and the user's shell in /etc/passwd but the problem turned out to be in /etc/shadow. The user was listed as:

wondapp:*LK*:::::::

This was because a password was never set for it. I just edited it to read:

wondapp:NP:::::::

Which still doesn't make a valid password but doesn't lock-out the account either. Cron jobs for wondapp work now.

Monday, 12 November 2007

Nagios check_http SSL check

Nagios plugin check_http has a -C option that allows you to be warned when the SSL certificate is about to expire. It always annoyed me, however, that the expiry date was printed in crazy-ass US date format (which doesn't make any sense).

I've altered the source code so that it prints the expiry date in human readable format.

The changes were made to file plugins_sslutils.c. You can download a patch for version 1.4.10 here. Or just download the changed file; it might work for other versions too.