Deploying watchdog on ipfail-plugin for Heartbeat


The kernel uses watchdog to handle a hung system. Watchdog is simply a kernel module that checks a timer to determine whether the system is alive. Watchdog can reboot the system if it think it is hung. Watchdog is quite useful to to determine a server hang situation

To activate watchdog

respawn clusteruser /usr/lib/heartbeat/ipfail
ping 172.16.1.254     172.16.1.253
#ping_group pingtarget 172.16.1.254 172.16.1.253
watchdog /dev/watchdog
auto_failback off

when you enable the watchdog option in your /etc/ha.d/ha.cf file, Heartbeat will write to /dev/watchdog file at an interval equal to the deadtime timer  If heartbeat fail to update the watchdog device, watchdog will initiate a kernel panic once the watchdog timeout period has expired.

Configure kernel to reboot when there is kernel panics

To force the kernel to reboot instead ojust hanging when there is kernel panics, you have to modify the boot arguments passed to the kernel. This can be done on /etc/grub.conf

#aaaaaa; line-height: 1.5; padding: 15px;">default=0
timeout=0
splashimage=(hd0,0)/boot/grub/splash.xpm.gz
hiddenmenu
title Fedora (2.6.29.4-167.fc11.i686.PAE)
root (hd0,0)
kernel /boot/vmlinuz-2.6xxxxx.i686.PAE ro root=LABEL=/ panic=60
initrd /boot/initrd-2.6.xxxxx.i686.PAE.img

Alternatively, if you are using lilo.conf, you can add the following line

append="panic=60"

Remember to do a

# lilo -v
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s