Search This Blog

Sunday, October 25, 2009

Nagios plugin

Not so long ago I heard the question if (or rather how) it is possible to write a NRPE plugin checking the resources utilization of an application. I'm using Nagios on the daily basis, but I haven't needed to write any plugin yet. When I went through existing plugins most/all of them checked resources on a server level. It makes sense, you are not so interested what exactly doing you server if your website/database is available and response fast. It especially true if you have 100, 500, 1000, ... servers. Anyway, I found the question interesting, even if was rather theoretical than practical one. After some research I found Jason Faulkner plugin which should be a good base, modified it a bit and created this script:

#!/bin/bash 
#
# Nagios plugin to monitor a process. Can easily be modified to do 
# pretty much whatever you want.
#
# Licensed under LGPL version 2
# Copyright 2006 Broadwick Corporation
# By: Jason Faulkner jasonf@broadwick.com
#
# Modified to measure CPU usage of chosen process.
#
# USAGE: cpu.sh process_name warning_level critical_level
#
# Licensed under LGPL version 2
# Copyright 2009 Wawrzyniec Niewodniczański
# Modification by: Wawrzyniec Niewodniczański wawrzek@gmail.com

process_name=$1
WARLVL=$2
CRITLVL=$3

OKMSG="STATUS OK: ${process_name} running"
CRITMSG="STATUS CRITICAL: ${process_name} using more than ${CRITLVL} % of Memory"
WARNMSG="STATUS WARNING: >1 ${process_name} using more than  ${WARLVL} % of Memory"
UNKMSG="STATUS UNKNOWN: ${process_name}, check if process is running"

PROCESS=`ps axu | grep  -v ${0}|grep -v grep | grep ${process_name}`
CPU=`echo ${PROCESS}| awk '{cpu+=$3} END {printf "%d", cpu}'`

if [[ $PROCESS != "" ]]  
then 
        if (($CPU < $WARLVL))
        then
                echo "$OKMSG"
                exit 0
        elif (( "$CPU" < $CRITLVL ))
        then 
                echo "$WARNMSG"
                exit 1
        else 
                echo "$CRITMSG"
                exit 2
        fi  
else
        echo "$UNKMSG"
        exit 3
fi
I would say that it's nothing excited. There are two important lines. The first one searching the process name in output of ps command and excluding the lines with script name and grep from the list. The another one using awk to add value of CPU usage from the list created in first line. BTW if you would prefer to check memory usage rather then processor, change {cpu+=$3} to {cpu+=$4} (or even to {mem+=$4}) in awk command. I also wrote the nagios command which I believe should work. "believe" not "know", as I haven't try it yet ;)

# 'check_cpu' command definition
define command{
        command_name    check_cpu
        command_line    /usr/lib/nagios/plugins/check_cpu $ARG1$ $ARG2$ $ARG3$
ń} 
 
Useful links
  1. http://www.nagios.org/documentation
  2. http://debianclusters.cs.uni.edu/index.php/Creating_Your_Own_Nagios_Plugin
  3. http://www.ibm.com/developerworks/aix/library/au-nagios/index.html
  4. http://lena.franken.de/nagios/own_service.html