linux - nagios 中的监控 gearman

标签 linux nagios gearman

我正在尝试通过 nagios 监视 gearman,因为我正在使用脚本 check_gearman.sh

本地主机是 gearman 服务器运行的地方。

当我运行时

./check_gearman.sh -H localhost -p 4730 -t 1000

结果是:

CRITICAL: gearman: gearman_client_run_tasks : gearman_wait(GEARMAN_TIMEOUT) timeout reached, 1 servers were poll(), no servers were available, pipe:false -> libgearman/universal.cc:331: pid(613)

有人可以帮我解决这个问题吗?

下面是脚本

#!/bin/sh
#
# gearman check for nagios
# written by Georg Thoma (<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0562606a776245716d6a68642b666b" rel="noreferrer noopener nofollow">[email protected]</a>)
# Last modified: 07-04-2014
#
# Description:
#
#
#

PROGNAME=`/usr/bin/basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
REVISION="0.04"
export TIMEFORMAT="%R"

. $PROGPATH/utils.sh

# Defaults
hostname=localhost
port=4730
timeout=50

# search for gearmanstuff
GEARMAN_BIN=`which gearman 2>&1 | grep -v "no gearman in"`
if [ "x$GEARMAN_BIN" == "x" ] ; then # result of check is empty
   echo "gearman executable not found in path"
   exit $STATE_UNKNOWN
fi
GEARADMIN_BIN=`which gearadmin 2>&1 | grep -v "no gearadmin in"`
if [ "x$GEARADMIN_BIN" == "x" ] ; then # result of check is empty
   echo "gearadmin executable not found in path"
   exit $STATE_UNKNOWN
fi


print_usage() {
    echo "Usage: $PROGNAME [-H hostname -p port -t timeout]"
    echo "Usage: $PROGNAME --help"
    echo "Usage: $PROGNAME --version"
}

print_help() {
    print_revision $PROGNAME $REVISION
    echo ""
    print_usage
    echo ""
    echo "gearman check plugin for nagios"
    echo ""
    support
}

# Make sure the correct number of command line
# arguments have been supplied

if [ $# -lt 1 ]; then
    print_usage
    exit $STATE_UNKNOWN
fi

# Grab the command line arguments

exitstatus=$STATE_WARNING #default
while test -n "$1"; do
    case "$1" in
        --help)
            print_help
            exit $STATE_OK
            ;;
        -h)
            print_help
            exit $STATE_OK
            ;;
        --version)
            print_revision $PROGNAME $REVISION
            exit $STATE_OK
            ;;
        -V)
            print_revision $PROGNAME $REVISION
            exit $STATE_OK
            ;;
        -H)
            hostname=$2
            shift
            ;;
        --hostname)
            hostname=$2
            shift
            ;;
        -t)
            timeout=$2
            shift
            ;;
        --timeout)
            timeout=$2
            shift
            ;;
        -p)
            port=$2
            shift
            ;;
        --port)
            port=$2
            shift
            ;;
        *)
            echo "Unknown argument: $1"
            print_usage
            exit $STATE_UNKNOWN
            ;;
    esac
    shift
done

# check if server is running and replys to version query
VERSION_RESULT=`$GEARADMIN_BIN -h $hostname -p $port --server-version 2>&1 `
if [ "x$VERSION_RESULT" == "x" ] ; then # result of check is empty
      echo "CRITICAL: Server is not running / responding"
      exitstatus=$STATE_CRITICAL
      exit $exitstatus
fi

# drop funtion echo to remove functions without workers
DROP_RESULT=`$GEARADMIN_BIN -h $hostname -p $port --drop-function echo_for_nagios 2>&1 `

# check for worker echo_for_nagios and start a new one if needed
CHECKWORKER_RESULT=`$GEARADMIN_BIN -h $hostname -p $port --status | grep echo_for_nagios`
if [ "x$CHECKWORKER_RESULT" == "x" ] ; then # result of check is empty
   nohup $GEARMAN_BIN -h $hostname -p $port -w -f echo_for_nagios -- echo echo >/dev/null 2>&1 &
fi

# check the time to get the status from gearmanserver
CHECKWORKER_TIME=$( { time $GEARADMIN_BIN -h $hostname --status ; } 2>&1 |tail -1 )

# check if worker returns "echo"
CHECK_RESULT=`cat /dev/null | $GEARMAN_BIN -h $hostname -p $port -t $timeout -f echo_for_nagios 2>&1`

# validate result and set message and exitstatus
if [ "$CHECK_RESULT" = "echo" ] ; then # we got echo back
      echo "OK: got an echo back from gearman server version: $VERSION_RESULT, responded in $CHECKWORKER_TIME sec|time=$CHECKWORKER_TIME;;;"
      exitstatus=$STATE_OK
   else  # timeout reached, no echo
      echo "CRITICAL: $CHECK_RESULT"
      exitstatus=$STATE_CRITICAL
fi
exit $exitstatus

提前致谢。

最佳答案

如果您下载 mod_gearman 软件包,其中包含一个更好且功能更丰富的 Nagios check_gearman 插件。

使用您当前的插件,错误消息显示检查脚本无法连接到 gearman 守护程序。

您应该验证端口 4370 是否正在本地主机上监听,并且没有本地防火墙阻止连接。您可能已将 gearmand 安装在不同的端口上,或者仅在网络接口(interface)上监听,而不是在本地主机上监听。或者它根本没有运行,或者与运行检查的服务器位于不同的服务器上......

关于linux - nagios 中的监控 gearman,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27148126/

相关文章:

c - 在实时应用程序中使用 malloc

linux - 删除文件夹,不管它们是否为空

nagios - 有没有办法通过 Nagios Web 界面更改服务状态?

python - 是否可以在同一端口上运行两个不同的 gearman 客户端/工作人员?

php - 使用 GearmanManager 添加新工作?

php - Gearman:从后台 worker 向客户端发送数据

在linux终端gcc中以编程方式编译c代码

linux - 为什么多处理器系统需要禁用中断

linux - 用于检查文件在 x 分钟内创建的 Nagios 插件

monitoring - 如何使用nagios监控elasticsearch