Monday, April 14, 2014

Zabbix : Create a production network interface trigger

Following my two previous posts on how to add interface's description in Zabbix graphs [1] and triggers [2], I will finish this serie of Zabbix posts with the creation of a production interface trigger.

By default Zabbix includes the "Operational status was changed ..." trigger which is (from my opinion) a big joke :
  • The trigger disappears (status "OK") after the next ifOperStatus check (60 seconds by default)
  • The trigger is raised when an equipment is plugged in. This is a "good to know information" but I can't rise a high severity trigger each time something is plugged !
  • I can't tell if the interface was up and went down OR if the interface was down and went up.
  • If I want to have a "Something was plugged in on GEX/X/X" trigger, I would make a special trigger for that purpose.
  • The trigger doesn't include the interface's description (which is extremely irritating and makes me want to kill little kittens). Check my previous post [2] if you care about kitten's survival.
This new trigger will have the following properties :
  • Raise ONLY if the interface was up (something was plugged in) and went down (equipment stopped, interface shut or somebody removed the cable). 
  • Will disappear if the interface come back up.
  • A "high" severity and will include interface's description.

Go to "Configuration -> Templates -> Template SNMP Interfaces -> Discovery -> Trigger prototypes" and click on "Create trigger prototype".

Use the following line as trigger's name : 

 Production Interface status on {HOST.HOST}: {#SNMPVALUE}, {ITEM.VALUE2} : {ITEM.VALUE3}  

Use this as trigger's expression : 

 {Template SNMP Interfaces:ifOperStatus[{#SNMPVALUE}].avg(3600)}<2&{Template SNMP Interfaces:ifOperStatus[{#SNMPVALUE}].last(0)}=2&{Template SNMP Interfaces:ifAlias[{#SNMPVALUE}].str(this_does_not_exist)}=0  

This expression means, raise if interface was up "avg(3600)}<2" AND went down "last(0)}=2". The 3600 value specify how long the trigger will stay up; After 3600s "avg(3600)" will equals 2 and the trigger will disappear.
The .str(this_does_not_exist)}=0 expression is used to show the interface's description and is explained in my previous post [2].

Use this as trigger's description :
 Interface status went up to down !!!  
 Interface : {#SNMPVALUE}, {ITEM.VALUE1} = {ITEM.VALUE3}  

Set the severity to "high" (or whatever is your concern), you can override severity for each of your interface/equipement.

Wait until the discovery rule is refreshed (default is 3600s) or temporarily set it to 60s. We can now try to disable an interface to check the results, let's do this on bccsw02 ge/0/0/3 :

The trigger is raised as expected with the hostname, interface name and description, if you configured Zabbix actions, the alert message will look like
"Production Interface status ev-bccsw02: ge-0/0/3, down (2) : EV-ORADB01 - BACK_PROD"

Let's renable the interface :

Trigger goes green as the interface went up, you should receive a message saying :
"Production Interface status ev-bccsw02: ge-0/0/3, up (1) : EV-ORADB01 - BACK_PROD"
 

Be aware that you can also use SNMP traps for that purpose.

Hope that helps !

[1] : http://sysnet-adventures.blogspot.fr/2014/02/zabbix-display-network-interface.html
[2] : http://sysnet-adventures.blogspot.fr/2014/04/zabbix-display-network-interface.html

3 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. How do you handle events like this:

    Trigger: ifOperStatus.GigabitEthernet0/9 on SWxx has changed
    Trigger status: PROBLEM
    Trigger severity: Warning
    Trigger URL:

    Item values:

    1. ifOperStatus.GigabitEthernet0/9 (SWxx:ifOperStatus.["GigabitEthernet0/9"]): down (2)
    2. *UNKNOWN* (*UNKNOWN*:*UNKNOWN*): *UNKNOWN*
    3. *UNKNOWN* (*UNKNOWN*:*UNKNOWN*): *UNKNOWN*

    Trigger: ifOperStatus.GigabitEthernet0/9 on SWxx has changed
    Trigger status: OK
    Trigger severity: Warning
    Trigger URL:

    Item values:

    1. ifOperStatus.GigabitEthernet0/9 (SWxx:ifOperStatus.["GigabitEthernet0/9"]): up (1)
    2. *UNKNOWN* (*UNKNOWN*:*UNKNOWN*): *UNKNOWN*
    3. *UNKNOWN* (*UNKNOWN*:*UNKNOWN*): *UNKNOWN*

    These follow in one minute intervals. I suspect they are caused by devices (printer or intel AMT) going to sleep.
    Do you have some experience with that?

    ReplyDelete