UESPWiki:Zabbix

The UESPWiki – Your source for The Elder Scrolls since 1995
Jump to: navigation, search

The UESP uses Zabbix to monitor the status of its servers. The following are notes and details used to setup Zabbix and provide the required monitoring services.

Setup[edit]

  • Zabbix 1.8.1 server and agent are setup on content3.
  • Remaining servers run the Zabbix 1.8 agent.
  • Remote commands are enabled on all agents for custom monitoring scripts.
  • The Zabbix web front end is accessible at monitor.uesp.net.

Issues[edit]

  • The MySQL running on content3 was initially v4.0 which caused some issues during the Zabbix installation and execution. This was upgraded to v5.0 to try and solve those issues (which didn't work as noted below).
  • The Zabbix server log has many "MySQL server has gone away" entries (every 100 seconds or so). The associated SQL queries appear to function correctly when run manually.


Database Size[edit]

Assuming the following numbers:

  • Item refresh rate of 60 secs
  • Item storage for 30 days
  • Trend refresh rate of 3600 secs (3 items min/max/average)
  • Trend storage for 365 days
  • Event storage for 365 days
  • 1000 items total (for all servers)
  • 100 events per day
  • 50 bytes per item
  • 128 bytes per trend
  • 130 bytes per event
  • Total Item Size = 2.2 GB = (30*(1000/60)*24*3600*50)
  • Total Trend Size = 2.2 GB = 365*(1000/1800)*24*3600*128
  • Total Event Size = 5 MB = 365*100*130
  • Total Size = 5 GB

Custom Monitoring[edit]

MySQL[edit]


For MySQL monitoring we will take the zabbix_sender approach and create a client-side script that parses status from the MySQL database server and sends it back to the Zabbix server. The send_mysql_stats script runs on the MySQL server and uses the mysql-zabbix.stats file to determine which stats to send back to the Zabbix server.

The send_mysql_stats script basically just does:

  • Connects to the MySQL server using a limited access user account: SELECT access to the mysql.user table and the REPLICATION SLAVE, REPLICATION CLIENT privileges if replication status is needed to be monitored.
  • The various MySQL stats are output and parsed into key-value pairs.
  • The mysql-zabbix.stats file is loaded (one key per line).
  • All keys found in this file are sent back to the Zabbix server.

To use this script create a Zabbix template Template_mysql along with an application MySQL. Add Zabbix trapper type items of the form mysql.itemname for all stats being sent by the script that you wish to monitor.

We also need to somehow run the send_mysql_stats.php script on the MySQL server regularily with either cron or a Zabbix remote command item like:

  system.run[/etc/zabbix/commands/send_mysql_stats]

where send_mysql_stats is a simple script that calls the PHP version. Ensure the scripts have the correct executable permissions (chmod a+x and chown zabbix:zabbix) and that they return a value (0 to indicate success typically).

Apache[edit]


TODO


MediaWiki[edit]


TODO

  • Number of recent edits (last hour/day/week)
  • Number of new users
  • Number of reverts/deletes/blocks/uploads
  • Number of articles (total/good)
  • Job queue length
  • Page view counters

Lighttpd[edit]


TODO

Squid[edit]


TODO

SmartCtl[edit]


TODO

Memcached[edit]


Status[edit]

To get the status of the memcached server simply use the item key:

  proc.num[memcached]

and create an associated trigger for this item.

Stats[edit]

To get more stats from the memcached server some sort of custom setup is needed. There are a variety of ways to do this:

  1. Create a script to capture the output of the stats command run from a telnet session into the memcached server. Parse this output and output the one required key. Run this script remotely from Zabbix like system.run[get_memcached_stat cmd_get]. This has the minor downside of needing to get all the memcached stats for each individual statistic (i.e., for 10 stats you'd need to call/parse everything 10 times).
  2. Use the UserParameter command in the zabbix_agentd.conf file to configure custom parameters using the same script as above.
  3. Use the zabbix_sender method to send custom memcached item keys from the memcached server to the Zabbix server. This has the benefit of only needing to get the memcached stats once.

I've settled on using the zabbix_sender method. Create the following PHP script send_memcached_stats on the memcached server (requires php-pecl-memcache):

  <?php
  $ZABBIXSENDER = "/virtual/zabbix-1.8/bin/zabbix_sender";
  $ZABBIXSERVER = "content3.uesp.net";
  $ZABBIXPORT = 10051;
  $MEMCACHESERVER = "content1.uesp.net";
  $MEMCACHEPORT = 11000;
  $MEMCACHEKEY = "memcache";
  
  $m=new Memcache;
  $m->connect($MEMCACHESERVER,$MEMCACHEPORT);
  $s=$m->getstats();
  
  $count = 0;
  $gets = 0;
  $hits = 0;
  $hitrate = 0;
  
  foreach($s as $key=>$value)
  {
          exec("$ZABBIXSENDER -z $ZABBIXSERVER -p $ZABBIXPORT  -s $MEMCACHESERVER -k \"$MEMCACHEKEY.$key\" -o $value");
  
          if ($key == "cmd_get") $gets = $value;
          if ($key == "get_hits") $hits = $value;
          $count += 1;
  }
  
  if ($gets > 0) $hitrate = 100*$hits/$gets;
  exec("$ZABBIXSENDER -z $ZABBIXSERVER -p $ZABBIXPORT  -s $MEMCACHESERVER -k \"$MEMCACHEKEY.hitrate\" -o $hitrate");
  
  echo "$count";
  ?>

What this script does when run is to grab the stats from memcached, split them into key/value pairs and then send these keys/values to the Zabbix server. It returns the total number of keys parsed and sent (not counting the calculated hitrate).

On the Zabbix server create an item with a type of Zabbix Trapper and use one of the key names from the above script, for example memcache.limit_maxbytes. If unsure of the exact key names simply modify the script to echo the key/value pairs (just be sure to remove the echo before running it through Zabbix):

  pid = 2097
  uptime = 11424243
  time = 1266987710
  version = 1.2.2
  pointer_size = 32
  rusage_user = 2796.225909
  rusage_system = 9788.653898
  curr_items = 2071817
  total_items = 12386515
  bytes = 553573366
  curr_connections = 1
  total_connections = 92977928
  connection_structures = 142
  cmd_get = 231611197
  cmd_set = 12516523
  get_hits = 219810276
  get_misses = 11800921
  evictions = 0
  bytes_read = 29606309542
  bytes_written = 5632962999473
  limit_maxbytes = 805306368
  threads = 1
  hitrate = 94.904857298415
  22

We still need to get the script to run occasionally to actually send the values to the Zabbix server. Either a cron entry on the memcached server or another Zabbix item will work. Since the script returns the number of keys parsed a new item key system.run[php -f /etc/zabbix/commands/send_memcached_stats.php] run every 60 seconds will work. This also has the benefit of being able to use the returned number of keys as a trigger (if it doesn't return 22 keys then something is probably wrong or changed).

Note that this system has the benefit of being able to easily define the computed hitrate item in the above script. With other methods I was unable to create a calculated item within Zabbix for computing this value.

UESP Content[edit]


Items can be created to track the loading time of a Wiki page. For example:

  • web.page.perf[localhost,/wiki/Main_Page]
  • web.page.perf[localhost,/wiki/Special:RecentChanges]
  • web.page.perf[www.uesp.net,/wiki/Main_Page]

Just ensure the item values use a Numeric(float) data type. Associated triggers can likewise be created if the pages take longer than a certain time to execute. For the localhost method to work the host must be setup to respond correctly on the localhost address. On servers hosting multiple web sites an explicit host name, like in the last example above, will be required.

Backup Status[edit]


TODO

System[edit]


BTMP Log[edit]


The /var/btmp logs all login attempts and can be used to determine if someone is trying to brute force a password. The UESP servers should have a very low rate of login attempts (typically several per day at the most) so if the size of the btmp file increases quickly it probably indicates a brute force attempt.

The file size of the log can be monitored with a key of vfs.file.size[/var/log/btmp]. A trigger can be setup to alarm a potential brute force by using a key of {Template_Linux:vfs.file.size[/var/log/btmp].delta(60)}>10000. This key triggers if more than 10kb is logged in more than 60 seconds. You can also create a trigger like {Template_Linux:vfs.file.size[/var/log/btmp].delta(0)}<0 which will alarm whenever the btmp log file's size decreases indicating possible tampering.