Monitoring - Nagios, Shinken, Thruk, ...

mail

Thruk is down : No backend available

If the Thruk GUI displays :
No Backend available
None of the configured Backends could be reached, please have a look at the logfile for detailed information.

Check the following points :

  1. Is Livestatus listening ? Run : netstat -laputen | grep 50000 (was 6557 on previous versions)
    • yes : continue to the next step
    • no : restart shinken-broker : /etc/init.d/shinken-broker restart
      shinken-broker will fail to start if mongodb is down (check this !).
  2. Restart broker in debug mode to investigate logs :
    1. /etc/init.d/shinken-broker -d restart
    2. tail -f /usr/local/shinken/var/broker-debug.log | grep -i error
      OR : tail -f /var/log/shinken/broker-debug.log | grep -i error
  3. Restart Shinken :
    /etc/init.d/shinken restart
  4. Is mongodb listening ? Then :
    • yes : continue to the next step
    • no : start mongodb
      If this fails, remove its lock file :
      • rm /home/shinken/data/db/mongod.lock
      • OR : rm $(grep dbpath /etc/mongodb.conf | egrep -v "^#" | cut -d '=' -f 2)mongod.lock
  5. Look for crash evidence in mongodb logs :
    tail /var/log/mongodb/mongodb.log -n 100 | grep -A4 Unclean
    You may see mongodb is down as crashed and has to be repaired.
  6. Repair mongodb (source). As this rewrites data, this should be done as the user running the mongod daemon so that rewritten files'access rights are ok.
    mongod --dbpath /home/shinken/data/db/ --repair
    Or, as root :
    dbPath='/home/shinken/data/db'; mongod --dbpath $dbPath --repair && chown -R mongodb:nogroup $dbPath/*
  7. Then start mongodb
    /etc/init.d/mongodb start
  8. Then restart Shinken. (should work )
  9. Make sure NPCD is running
    (to be continued, see NPCD.txt)
mail

No graph in PNP

  1. Is NPCD running ?
    • ps -elf | grep [n]pcd
    • cat /var/run/npcd.pid
  2. View logs : tail /usr/local/pnp4nagios/var/npcd.log
  3. If NPCD is not responding :
    1. Kill it : kill -15 $(cat /var/run/npcd.pid)
    2. Remove PID file : rm /var/run/npcd.pid
    3. Then restart it as a daemon : /usr/local/pnp4nagios/bin/npcd -d -f /usr/local/pnp4nagios/etc/npcd.cfg
    For the impatients :
    • Kill NPCD : pidFile='/var/run/npcd.pid'; [ -e $pidFile ] && { echo -n 'Stopping NPCD ... '; kill -15 $(cat $pidFile); rm $pidFile && echo 'OK' || echo 'KO'; } || echo 'NPCD is not running.'
    • Restart NPCD : kill -15 $(cat /var/run/npcd.pid) && /usr/local/pnp4nagios/bin/npcd -d -f /usr/local/pnp4nagios/etc/npcd.cfg
  4. Is NPCD actually "eating" data ?
    tail -f /usr/local/pnp4nagios/var/npcd.log
    OK if this displays something such as Found xx files in /usr/local/pnp4nagios/var/perfdata/
  5. Does Shinken feed perfdata ?
    find /usr/local/pnp4nagios/var/perfdata -mmin -120
  6. What about the broker / PNP ?
    grep pnp /usr/local/shinken/var/broker*
  7. Verify PNP config :
    1. wget http://verify.pnp4nagios.org/verify_pnp_config (source : http://docs.pnp4nagios.org/fr/pnp-0.6/verify_pnp_config)
    2. perl verify_pnp_config -m npcdmod -c /usr/local/shinken/etc/nagios.cfg -p /usr/local/pnp4nagios/etc/
  8. ...

UPDATE 2013/04/24 (?) : service npcd start

mail

Shinken file tree

/usr/local/shinken default install directory
etc main config directory
shinken-specific.cfg Shinken config file
nagios.cfg Shinken settings that are similar to Nagios's
libexec plugins directory
var varying stuff
*log Logs
*pid PIDs
mail

Debug ePN

  1. create a directory for all the debugging work
  2. in this directory, copy :
    • the plugin to debug (you can symlink)
    • new_mini_epn (can be found with : find /your/nagios/install/path/ -name 'new_mini_epn' )
    • p1.pl (can be found with : find /your/nagios/install/path/ -name 'p1.pl' )
  3. edit p1.pl to set the path to the logfiles
  4. then run : ./new_mini_epn
  5. and paste the full plugin command line when prompted