Real monitoring, debugging, logging, metrics for real programmers for real production servers

Here are my current thoughts on server monitoring in production for software developers.

So, you plan to publish your server to production (you use docker right?), you have some logging, some monitoring, are you really aware of what you can do with monitoring? what should go to logging? what to audit? jmx? REST? web interface? bewildered by all these options? so are we!

Here is a brief summary of which monitoring tool to  use, prefer to use all of them ofcourse.

Mechanism
When to use
SNMP Alerts
Critical status change, Without false alarms, Prefer simplistic
Email notifications
Medium / High level severity status changes
Hadoop events
All (everything audited to hadoop)
JMX
Tweaking system (change log level) internal system info
Graphite
Main metrics
Logging
All (+ability to TRACE)
Admin WEB UI
Global system status, Server Components startup status, have here “support” people in mind.
Java Melody
general app statistics
Metrics
Gathering statistics + System status
Dashing
Cross server global system status
Google docs
Procedures & captains log of events
REST
expose your metrics via rest so that QA can gather them, you can create automatic processes to monitoring them
/isalive
expose a simple /isalive that will do nothing simply check whether your server is alive.  you can utilize it for simple response time, maybe you have some network issues, this will tell you how much time it takes to access your server without any logic done.
Web Flow Tracer
Allow you to see the flow of what happens in your server for a specific request.  Combination of logback + MDC + Filters + in memory appender.

Comments