Thursday, March 21, 2013

Monitoring - too often overlooked

Outages, unplanned downtime, critical events...

It is not a matter of IF they will happen, its a matter of when.

Outages are bad.
        Outages during business or production hours is very bad.
                 A user or client knowing before IT about an outage... inexcusable.

You would think this is just common sense yet it is one of the most common issues I find when starting a new engagement. (and probably one reason why the position opened up)

What to do?

IT is in the business of technology and providing information to the business units. This is where IT needs to be a customer to their service (See post on IT groups, the NOC is an IT group that is a customer to IT).

Some useful tools for varying budgets:

Budget Friendly:

  • MRTG - "Multi Router Traffic Grapher"  (open source)
    • Tool provides network utilization graphs.
    • Useful to see if WAN links are saturated where users would complain the network is slow.
    • Simple to complex dashboards can be created to give a view of the entire network landscape
  • Zenoss Open Source Edition
    • Can monitor all the devices on the network
    • Sends an alert so you know something is down
    • Web status page
    • Install can be difficult... read the install guide well and you'll be OK
  • Nagios (open source)
    • A long used monitoring tool that has many plug-ins to improve the product like Cacti
    • Sends alerts and can be setup with some very nice web status pages
    • configuration can be challenging to some
  • Servers Alive (low cost)
    • Very easy to setup monitoring
    • Free edition can monitor up to 10 items, standard or enterprise editions come with a small fee
    • Email alerting and additional plug-ins (such as POP check for monitoring email flow)

Enterprise class:

These systems may not be budget friendly, but you definitely get what you pay for. Hopefully the company you are with either has these in place or can fit them into the budget.

Note:
Don't be surprised if the place you starting at owns monitoring software licenses yet haven't found the time to deploy the solution.

Setting up monitoring the right way takes time and often projects for IT are set to lower priority than projects to the business. In one way, it's great to improve productivity and provide great business intelligence reports; however, if your not maintaining the house it can fall down around you. If it does fall, where will the company be then?

  • Microsoft System Center Operations Manager - aka SCOM
    • In my opinion, the Mercedes of monitoring.
    • The 2012 version can monitor Windows, Unix, Linux, Network devices and many software and hardware vendors are providing management packs to allow their solutions to also be monitored.
    • Talk to your MS reseller as there are Windows server bundle deals which can save you a bundle by rolling up server licensing, endpoint protection and the System Center suite.
    • Multiple methods to deliver alerts (email, sms, etc)
    • Full robust dashboards
    • Performance tracking and trending reports
    • Direct ties into other System Center Suite modules that if all are fully deployed is a wonderful sight to see (and probably a great IT environment to work in)
    • Alternative options to Microsoft
  • CorrelSense Sharepath
    • Monitors more than just a server's performance or a network link... this is end-to-end application monitoring of the user experience.
    • Bottlenecks are instantly identified and ends finger pointing between teams
    • Reports on performance out of normal operation with really impressive drill down dashboards
Of coarse there are many other solutions out there that are on-premise or offered as a cloud service. The key factors in any monitoring is that:
  1. Minimize the false positives
    • If too many alerts are generated and safely ignored, does anyone see the critical alerts or do they go into a junk mail folder?
  2. IT uses the tools wisely.
    • If no one is watching the alerts, its has no purpose
    • If you need 24x7 support, you need a NOC
  3. The staff can easily support and maintain the software 



2 comments: