For a lesson to be remembered it has to be painful. This is how humans seem to work. This time a rather disastrous death of one of vital services without me noticing it seem to cost my company some money and clients. This is really not a good move for a startup that desperately needs scaling. So the lesson is to have a proper monitoring service for all core activities and do not rely on humans ‘checking it once in a while’. There are numerous off-the-shelf packages that provide everything you need and more. After a small research about a year or so ago I had chosen Zabbix as the tool of choice. It may have been an overkill and Zabbix is definitely NOT the most intuitive tool to work with, hence this post.
What is needed is to check availability of certain public services (on fixed URLs) to make sure they still do what is advertised. If not I want to be notified within a reasonable period by e.g. e-mail (immediately is the ideal case, but I don’t want to poll my services every second). Zabbix can do million and one thing including such a trivial task as this one, but there are quite some steps to be taken. Hereby a short checklist that I had to discover through manuals and trial-and-error actions.
Pre-requisites
I assume that you already have set up the following things (since I had them already):
- host to be checked
- user in a proper user group with defined e-mail where you want to receive notifications
E-mail configuration
First of all make sure Zabbix can send e-mails. Apparently this is not part of initial configuration as I had default settings for quite some time. Go to Administration | Media Types
and select the type you would like to use. I wanted e-mail (having BlackBerry the other ones are rather superficial), therefore I clicked on Email. By default it has some sample information, fill in the
- SMTP server – enter the SMTP server you can use to send outbound e-mail.
- SMTP helo – I just put my Zabbix server name there, but I don’t see it coming back anywhere.
- SMTP email – enter the e-mail address of ‘sender’, a ‘fake’ one is OK unless the SMTP server you entered above is picky on sender addresses.
Don’t forget to save the changes.
If you wonder why I’ve spent so many lines above in this post and haven’t referred to the Zabbix documentation the answer is rather simple. The Zabbix documentation on version 1.8 is very concise and states exactly two words: “Email notification.”. Check it yourself if you wish.
Indicate what has to be monitored
So now let’s define what we have to monitor. I wanted to be able to open certain URL and see whether I get a proper response (yes, I can write few Python lines for this, but I won’t get it all in a consistent interface with many details as I get in Zabbix). This is called “Web Scenario” in Zabbix. Go to Configuration | Web
to define one.
Here is a part I don’t really like about Zabbix. It took me some time to find the “Create Scenario” button simply because it was at the right TOP of the page, ABOVE the list of scenarios. OK, it is consistent with other pages, but it has very little intuitiveness in it.
Anyway, when you have located the “Create Scenario” button you will get a form that needs to be filled. Here the Zabbix documentation is a bit more verbose, check the “WEB Scenario” manual for some tips. One thing to mention is that web a scenario is just a ‘placeholder’ for actual actions (called ‘Steps’), it does not do much itself. To actually indicate which page has to be opened you will need to define at least one step (click the ‘Add’ button in the form). OK, here you go, define a good name (you will have to locate it later on, so come up with a distinct one!), paste the URL of the page/service you would like to check and put what HTML response (e.g. 200 for OK, 404 for Not Found, etc) you expect to have when it is called.
We almost there. What is missing is coupling between the mail and the check.
Configuring triggers
Now we need to couple the new scenario to the trigger list of a server. Go to Configuration | Host
and click on the host you would like to couple the scenario to (well, I suppose you were checking the service provided by some server and you have that server in the list already…). Then click on the “Triggers” link (another ‘intuitive one’). When you get the list of triggers locate the “Create Trigger” button (yes, at the right near the top of the page, grrrr), and now we have to do some ‘magic’. Come up with some distinct name (you will need this one later again!), then press the ‘Select’ button. In the next pop-up press ‘Select’ button again.
Here you will have to select the name of the WEB Scenario you have selected above, make sure you select the one measuring RESPONSE. The first one in the list for me was measuring ‘Download speed’, which is not exactly what I am interested NOW (may be some time later). OK, having the ‘Response code for…’ selected (just click on it) brings you back to the form. Select “Last value not N” and leave N on 0 and press the ‘Insert’ button. You will have to end up with something like
{your.server.com:web.test.rspcode[Scenario Name,Step Name].last(0)}#0
Now press ‘Save’. But don’t relax yet, this will make trigger go off when your service is not behaving properly, but you won’t get a response yet. Let’s go to the final step.
Creating alerts
Go to Configuration | Actions
and press the ‘Create Action’ button. Here we have another set of odd forms that needs to be filled in. BTW, at this point I was almost giving up and dreaming of a small script sending me happy e-mails once in a while :). Anyway, give a name, leave “Even source” on “Triggers”, change the message header (e.g. put some tags there so that your e-mail program can sort/filter these messages) and body (e.g. signature, link to the Zabbix server, etc). DON’T press ‘Save’ just yet! First press ‘New’ button in the form below under ‘Action conditions’. Here you want to indicate upon which condition you want to send the e-mail you have just defined above (I wonder what twisted mind came up with this ‘workflow’). I simply put the name of the trigger (defined above). You can also filter on value (OK | ERROR), but I find it useful to receive notifications on both errors and recoveries. OK, here is the final step. At the top right there is another form called ‘Action operations’. Press ‘New’ there and select at least user group where you want to send your e-mail to and select ‘Emai’ from the ‘Send only to’ list. At least I don’t have anything else (for sure no Siemense MC35 anymore) to send SMS’es.
OK, now press ‘Save’ under the ‘Action’ form.
You’re done! Next time your server won’t give the proper response you will get an e-mail about this event and off you go fixing the problem hopefully before hordes of clients start cursing you.
Well, I also wonder if writing a script would be easier (well, for sure less words than I have spent describing the steps above), but I guess it is always the case with heavy tooling used for rather simple purposes.
Good luck!
Hi! I don’t know your name!
Are you monitoring web services this way? How do you send the request?
Thanks!
Hi,
I am monitoring availability of the external interface by checking reply on a specific URL. If you know which URL is used by WebServices you can add a rule to check periodically that specific URL.
—
Best regards,
Oleksii