February 12

What you could learn from ‘Practical Monitoring’ by Mike Julian (2017, 137 pages)

Posted by Max on February 12, 2019 in Book reviews | Leave a comment

Modern software architecture makes finding what is going wrong more complex than ever before. in his book ‘Practical Monitoring’ Mike offers good advice on how to find and diagnose faults, failures and errors.

Monitoring anti-patterns

Tool obsession – think about the mission and goal, rather than the tool or approach
Monitoring is a job – everyone needs to consciously include monitoring in their roles. Think about monitoring when you design, build and run software
Checkbox monitoring – make sure you define work ‘working’ means and monitor that
Using monitoring as a crutch –
Manual configuration

Monitoring design patterns

Composable monitoring. Use multiple specialised tools and couple the loosely together to form a monitoring platform with the following components

Data collection – use counter or gauge
Data storage – store time series data in TSDB, other data types should be stored differently
Visualisation – show important data visually, making it easy to understand, access and filter
Analytics and reporting – create reports to make sure services and third parties are living up to their
Alerting – only report things you need to act on, or need to make a decision

Monitoring from the user perspective. Users care about if the app works, not how many nodes you are runnin

start monitoring where users interact with your code
monitor response code s (especially 5xx codes)

Buy not build.

Do not build, unless you are Netflix, Google or Facebook, the overheads are huge
You will not have the expertise
You will not investment more money in improvement as companies who provide SaaS
No really, use SaaS

Continual Improvement

Realistically, you will need to re-architecture your monitoring every 2/3 years
Keep improving little and often

Monitoring and alerts. Differentiate between FYI and action

FYI – something is working
Action – someone needs to do something
Top tips – stop using emails, write Runbooks, delete and tune alerts

Runbooks. Write Runbooks for each of your services

What is the service, what does it do?
Who is responsible for it?
What dependencies does it have?
What does the infrastructure for it look like?
What metrics and logs does it emit, and what do they mean?
What alerts ar set up for it and why?

Monitoring the business

Find the KPI or OKRs that will drive success
Monitor these by default

Monitor front end

Monitor page load times for actual users
Monitor JS or other framework exceptions
Keep track of pager load time with your CI system

Monitor applications

fit logging and monitoring to your apps by default
do the basics first, request/response times, database read/write times

Monitor your build and release pipelines

when did a deploy start/end, what build and who deployed it
see who/what keeps breaking the environment
heat beat – check you apps are up frequently

Microservice specifics

Distributed tracing – tag every request with a request ID
Distributed tracing is very complex and tough, only do this after instrumenting your apps with metrics and logs

Server monitoring. Automate the monitoring of all you servers/hosts

CPU (% used)
Memory (% used vs free)
Network
Disk
Load (how many processes are waiting to be served by the CPU)
SSL certificates (especially expiration)
Database servers (especially queries per second)
Load balancers
Message queues (queue length and consumption rate)
Chaching (hit/miss ratio)
DNS

I learnt a lot from this book.

Be explicit on what you are monitoring, and why
Create Runbooks to make it easy for anyone to help
Invest in monitoring, and continue to invset in monitoring, but start with the basics first

You can buy Practical Monitoring from Amazon UK here

← WHAT YOU COULD LEARN FROM ‘THE END OF BUREAUCRACY’ THIS MONTHS HBR (NOV/DEC 18) What you could learn from ‘The Hard Truth About Innovation’ HBR Jan/Feb19 →

getting better every day

Tags

What you could learn from ‘Practical Monitoring’ by Mike Julian (2017, 137 pages)

Leave a Reply Cancel reply

getting better every day

Tags

What you could learn from ‘Practical Monitoring’ by Mike Julian (2017, 137 pages)

Share this:

Related

Leave a Reply Cancel reply