How to build an easily supported application
There are engineers who write software (dev) and then there are engineers who operate that software (ops). Somehow it's common that the latter ones are having a lot of problems understanding what's going on with some software and why it's not behaving in the way one would expect. And that's true not only for homegrown applications, but for commonly used OSS systems too.
In this talk I'll look into diagnostic for a couple of popular OSSs, like nginx, postgresql, and mongodb. I'll discuss what metrics they do already have and what else you might wish to have, but is still missing.
In the other half of the talk I want to give a couple examples of how to instrument your homegrown app, so these ops people won't grumble afterwards.
- what to measure and how to do it best: errors, timings, different internal application states
- what to use: YOUR_LANG-metrics, YOUR_LANG-statsd-client, log processing etc.
- how not to be too clever and not to overload production with gathering all that telemetry.
Nikolay is a co-founder of okmeter.io server monitoring SaaS, before that he was managing operations of several large web-sites.