Wednesday, June 8, 2011

check_printq - AIX QUEUE MONITOR NRPE PLUGIN FOR NAGIOS

There are instances when some of our AIX printer queues encounter
problems on their corresponding physical devices (most commonly paper
supply) causing jobs to build up. Since I couldn't find one that will
show the status the way I wanted to, I created a simple script called
"check_printq".
Its a crude one but it works. You can find the script below.

tested on:
AIX 5.3/6.1 with Version 2.5.x
Nagios® Core™ 3.2.3
Icinga 1.3.1
NRPE Version: 2.12

WHAT THE SCRIPT DOES
it will monitor specific queues on the AIX machine. Since our NRPE
setup was compiled without command argument processing, each monitored
queue needs to be defined in the corresponding nrpe configuration.
although the script can be used with arguments if your NRPE daemon
supports it.


DEFINING IT IN NRPE (AIX SIDE):
edit:
/opt/nagios/etc/nrpe_check_commands.cfg

add:
command[check_PTISAP11]=/opt/nagios/libexec/check_printq  -Q PTISAP11 -w 3 -c 10
command[check_PTISAP05]=/opt/nagios/libexec/check_printq  -Q PTISAP05 -w 4 -c 5

reload nrpe daemon:

kill -1 $PID
or restart:
/opt/nagios/sbin/nrpe -n -c /opt/nagios/etc/nrpe.cfg -d

NAGIOS SERVER SIDE DEFINITION
define service{
        hostgroup_name          AIX-TEST
        service_description     PTISAP11
        check_command           check_nrpe!check_PTISAP11
        use                     template-service-test
        }
define service{
        hostgroup_name          AIX-TEST
        service_description     PTISAP05
        check_command           check_nrpe!check_PTISAP05
        use                     template-service-test
        }

save and do a syntax check;
sudo nagios -v ../nagios.cfg

and then reload nagios.

SAMPLE RESULTS
qchk -WA
Queue                Dev            Status       Job Files              User         PP   %  Blks  Cp Rnk
-------------------- -------------- --------- ------ ------------------ ---------- ---- --- ----- --- ---
PTISAP11             hp@PTISAP11    DOWN
                                    QUEUED      1193 STDIN.426216       root                    1   1   1
PTISAP08             hp@PTISAP08    READY
PTISAP0801           hp@PTISAP08    READY
PTISAP05             hp@PTISAP05    READY
PTISAP0501           hp@PTISAP05    READY

nagios image:









THE SCRIPT
#! /bin/sh
#####
#  check_printq - plugin to check AIX printer queues
#  will show queue status and queue job number
#  v1.2 - updated some queue checking codes
####
ECHO="/usr/bin/echo"
QCHK="/usr/bin/qchk"
GREP="/usr/bin/egrep"
PROGNAME=`/usr/bin/basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
REVISION=`echo '$Revision: 1.2 $' | sed -e 's/[^0-9.]//g'`

. $PROGPATH/utils.sh

ShowUsage() {
    echo "Usage: $PROGNAME -Q QueueName -w # -c #"
    echo "Usage: $PROGNAME --help"
    echo "Usage: $PROGNAME --version"
}

ShowHelp() {
    print_revision $PROGNAME $REVISION
    echo ""
    ShowUsage
    echo ""
    echo "AIX Qdaemon monitor for Nagios"
    echo ""
    support
}

# Make sure the correct number of command line
# arguments have been supplied

if [ $# -lt 1 ]; then
    ShowUsage
    exit $STATE_UNKNOWN
fi
# Grab the command line arguments
exitstatus=$STATE_WARNING #default
while test -n "$1"; do
    case "$1" in
        --help)
            ShowHelp
            exit $STATE_OK
            ;;
        -h)
            ShowHelp
            exit $STATE_OK
            ;;
        --version)
            print_revision $PROGNAME $VERSION
            exit $STATE_OK
            ;;
        -V)
            print_revision $PROGNAME $VERSION
            exit $STATE_OK
            ;;
        -Q)
            qNAME=$2
            shift
            ;;
        -w)
            qWarnCount=$2
           shift
                ;;
        -c)
            qCritCount=$2
           shift
                ;;
        *)
            echo "Unknown argument: $1"
            ShowUsage
            exit $STATE_UNKNOWN
        ;;
esac
shift
done
# check if valid printer queue
qCheck=`${QCHK} -WP${qNAME} > /dev/null 2>&1 ; echo $?`
#invalid queue will spit code 64
if [ ${qCheck} -ne 0 ]
then
#if not valid, exit
        exit  $STATE_UNKNOWN
fi

# get jobcount and compare with threshold
qJobCount=`${QCHK} -P${qNAME} | ${GREP} -c QUEUED`
nSTATUS=$STATE_OK
if [ ${qJobCount} -gt  ${qWarnCount} -a  ${qJobCount}  -lt ${qCritCount}  ]
then
        nSTATUS=$STATE_WARNING
else
  if [ ${qJobCount} -gt  ${qWarnCount} -o  ${qJobCount}  -gt ${qCritCount}  ]
    then
        nSTATUS=$STATE_CRITICAL
  fi
fi

# check status if queue is DOWN, READY or RUNNING
qSTAT=`${QCHK} -WP${qNAME} | ${GREP} ${qNAME} | awk '{print $3}'`
case "${qSTAT}" in
READY)
    $ECHO "$qNAME is ${qSTAT} with [ ${qJobCount} ] queued Jobs"
    exit $nSTATUS
        ;;
RUNNING)
    $ECHO "$qNAME is ${qSTAT} with [ ${qJobCount} ] queued Jobs"
    exit $nSTATUS
        ;;
DOWN)
    $ECHO "$qNAME is ${qSTAT} with [ ${qJobCount} ] queued Jobs"
    exit $STATE_CRITICAL
        ;;
*)
        # any other state = unknown
    ${ECHO} "[ ${qSTAT} ] queued Jobs with Unknown STATUS: ${qSTAT}"
        nSTATUS=$STATE_UNKNOWN
        exit $STATE_UNKNOWN
        ;;
esac
    exit $nSTATUS
## EOF


CHECKING THE SCRIPT
checking from AIX side (local to the script):
run it using:
sh -x /opt/nagios/libexec/check_printq -Q PTISAP11 -w 2 -c 10
this will run the script in debug mode and will show you
what exactly is happening.


from nagios server side:
[user@NAGIOS objects]$ /usr/lib64/nagios/plugins/check_nrpe -H UX0017 -n -p 5666 -c check_PTISAP11; echo $?
PTISAP11 is DOWN with [ 0 ] queued Jobs
2

[user@NAGIOS objects]$ /usr/lib64/nagios/plugins/check_nrpe -H UX0017 -n -p 5666 -c check_PTISAP05; echo $?
PTISAP05 is READY with [ 0 ] queued Jobs
0

this will verify the return codes needed for the monitoring based on the defined NAGIOS PLUGIN RETURN CODES.


Further reading:
http://nagiosplug.sourceforge.net/developer-guidelines.html#PLUGOUTPUT

2 comments:

  1. Good work! Thank you.

    ReplyDelete
  2. this just made it to Nagios Exchange:

    http://exchange.nagios.org/directory/Plugins/Printing/check_printq/details

    ReplyDelete