Installing and configuring Ganglia on CentOS 5.4


Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. Ganglia will help you to determine if there are any trends that might be causing a hardware under capacity, runaway process etc. Ganglia requires very little CPU, memory and network resources to run. According to Ganglia official website, it can scale easily to 2000 nodes.

For the Blog Entry, I’m assuming you are building a HPC Cluster with a Head Node and several Compute Nodes.

Ganglia has 2 daemond gmetad and gmond. It also require other prerequsites such as PHP, RRDtool, Apache. First-thing-first

  1. gmond – Ganglia monitoring daemon. Gmond job is to gather performance metrics and keep track of the status of othe gmond running in the cluster. If one gmond daemond fail due to failure of the nodes, all remaining gmond knows about it. It is required on every node.
  2. gmetad – gmetad is only needed to run on the cluster head node. Its job is to consolidate and poll the gmond daemonds for the performance metric information every 15 seconds and store the information in the RRDtool round-robin database (In a round-robin database, the database never fills up as the newest data will override the older data.). Finally, it displays the information on to the Apache Web server
  3. The Ganglia Web package require PHP on the cluster head node to display the information on Apache

Part I: To install ganglia on CentOS 5.4 on the Cluster Head Node, do the followings:

  1. Make sure you have the RPMForge Repository installed. For more information, get more information at LinuxToolkit (Red Hat Enterprise Linux / CentOS Linux Enable EPEL (Extra Packages for Enterprise Linux) Repository)
  2. # yum install rrdtool ganglia ganglia-gmetad ganglia-gmond ganglia-web httpd php
  3. However at this point in writing, you might got the followings “Error: Missing Dependency: rrdtool = 1.2.27-3.el5 is needed by package rrdtool-perl“. To resolve the issue, you may want to look at LinuxToolkit (Error: Missing Dependency: librrd.so.2()(64bit) is needed by package ganglia-gmetad (epel)).
  4. By default, Ganglia uses multi-cast or UDP to pass information. I refer to use UDP as I can have better control
  5. Assuming 192.168.1.5 is our head node and port number 8649. Edit /etc/gmond.conf and start the gmond service.
    cluster {
    name = "My Cluster"
    owner = "kittycool"
    latlong = "unspecified"
    url = "unspecified"
    }
    udp_send_channel {
    host = 192.168.1.5
    port = 8649
    ttl = 1
    }
    udp_recv_channel {
    port = 8649
    }
  6. Configure the service level startup-up  and start the service for gmond
    chkconfig --levels 235 gmond on
    service gmond start
  7. Configure the /etc/gmetad.conf to define the datasource
    data_source "my cluster" 192.168.1.5:8649
  8. Configure the service level startup-up  and start the service for gmetad
    chkconfig --levels 235 gmetad on
    service gmetad start
  9. Configure the service level startup-up  and start the service for httpd
    chkconfig --levels 235 httpd on
    service httpd start

Part II: To install on the Compute Nodes, I’m assuming the Compute Node are on private network and does not have access to the internet. Only the Head Node has access to internet. I’m also assuming there is no routing from the compute nodes via the head node for internet access

  1. Read the following blog  Using yum to download the rpm package that have already been installed.
  2. Copy the rpm to all the compute nodes
  3. Install the package on each compute nodes*
    yum install ganglia-gmond
  4. Configure the service startup for each compute nodes*
    chkconfig --levels 235 gmond on
  5. Copy the gmond /etc/gmond.conf configuration file from head node to the compute node.

Part III: If  you wish to create custom metrics that is not included in the standard Ganglia Distribution, you can write your own performance monitoring scripts to report on the gmond running on the compute nodes with gmetric. To find sample gmetric scripts, you can find from

  1. Gmetric Script Repository. For example, you can use gemetric for NFS script “Linux NFS client GETATTR, READ and WRITE calls“.

Part IV: Using Command line gstat

  1. You can use gstat to list information about the cluster nodes.  Some useful commands are:
  2. # gstat -h
    (To show help for all commands)
  3. # gstat -a
    (List all nodes)
  4. # gstat -l
    (Print ONLY the host list)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s