That's a very misleading feeling. The proper operation of a server depends on many dynamic parts, like having Internet connectivity, stable power supply, proper cooling, enough network bandwidth, free disk space, running services, available CPU power, IO bandwidth, memory, ... That's just the tip of the iceberg, but I think the point is clear - there is a lot that can go wrong with a server.
Eventually some of those subsystems will break down for one reason or another. When one of them fails, it usually brings down others, creating a digital mayhem that can be quite hard to untangle. Businesses relying on the servers being up and running tend not to look too favorably on the inevitability of the situation. Instead of accepting the incident philosophically and being grateful for the great uptime so far, business owners instead go for questions like "What happened?!?!!", "What's causing this???" and "WHEN WILL IT BE BACK UP????!!!". Sad, I know.
Smart people, who would rather avoid coming unprepared for those questions, have come up with the idea of monitoring, so that:
- problems are caught up in their infant stages, before they cause real damage (e.g. slowly increasing disk space usage);
- when some malfunction does occur, they can cast a quick glance over the various monitoring gauges, and quickly determine what's the root cause of it;
- they can follow trends in the server metrics, so they can both get insight into issues from the past and predict future behavior.
These are all extremely valuable benefits, and it's widely accepted that the importance of server monitoring is coming second only to the criticality of backups. Yet, there are more servers out there without proper monitoring that you would expect. The main reasons not so setup monitoring are all part of our human nature, and can be summed up to "what a hurdle to install and configure...", "the server is doing it's job anyway..." and my favorite "I'll do it...eventually".
I have some news for the Linux server administrators - you have an excuse no more. We've come up with a web monitoring system for your servers that is easy to setup, rich in functionality and completely free (at least for the time being). Go on and see a demo of it, if you don't believe me. If you decide to subscribe, it will take less than 1 minute. Adding a machine to be monitored basically boils down to downloading a Bash script and setting it up as a cron job (you'll get step-by-step instructions after you log in and add a new server record on the web). And if you want to integrate S2Mon into a custom workflow/interface of yours, there is API access to everything (in fact, the entire S2Mon website is one big API client).
Once you hook up your server to the system, you will unlock a plethora of detailed stats, presented in interactive charts like this one:
What we see above is a pretty picture of the load falling on the Apache web server. Apparently we've had the same pattern repeating during the last week. That's a visual proof that the web server workload varies a lot throughout the day (nothing unexpected, but we can now actually measure it!).
OK, I now want to see how are my disk partitions faring, and when should I plan for adding disk space:
Both partitions are steadily growing, but if the rate is kept, there should be enough space for the next 5-6 months.
Hey, you know what, I just got some complaints from a user that a server was slow yesterday, was there anything odd?
Yep, most definitely. The load was pretty high throughout the entire afternoon. Believe it or not this time it was not his virus-infested Windows computer...
Your boss wants some insight on a specific network service, say IMAP? There you go:
Wonder what your precious CPU spends its time on? See here:
As you see, S2Mon can provide you with extremely detailed stats ready to be used anytime you need them. Of course, there is a lot more to it, and I'll cover more aspects of the setup, configuration and the work with S2Mon it in the next parts. As always, feedback is more than welcome!