May 10

What you could learn from ‘Kafka The Definitive Guide’ (2017, 280 pages)

While the shift from monolithic architecture to micro-services is now very well know (even legacy enterprise companies have begun their transition), the lesser well know, but potential more significant is the transition from request-response communication to pub-sub.

Request-response, broadly (simplified) works like this:

Max is thirsty and hungry and sends two requests to two different services Tea service and Biscuit service

Tea service

  • Request – Hi Tea service, can I have a cup of tea
  • Response – Hi Max yes, here is a cup of tea

Biscuit service

  • Request – Hi Biscuit service, can I have a biscuit
  • Response – Hi Max yes, here is a Rich Tea biscuit

Here is how it would work in a publish-subscription (pub-sub):

Max is thirsty and hungry and publishes this fact through a Max status producer

  • Max status producer – I am thirsty

Tea service which subscribes to Status updates

  • OK Max, here is a cup of tea

Biscuits service which subscribes to Status service

  • OK Max, here is a Rich Tea biscuit

While the difference between the two seems very basic, the main implications are:

  • pub-sub models are easier to scale, as you can add as many ‘consumers’ (subscribers) to a service
  • pub-sub allows you to build faster, as services are more loosely coupled i.e., you can change the tea service to offer orange juice without altering the status service
  • pub-sub can be less reliable, as messages can get lost if the consuming service is down
  • pub-sub can be a nightmare to run, without good monitoring and error handling, it can be very difficult to find out what is going wrong

Apache Kafka takes pub-sub a stage further and allows you to send data in messages as well as just the message. Kafka also has the potential to have persistent messages (i.e., they can be stored for a period of time). Persistent messages are one of the main benefits of Kafka, as it allows you to pause, stop and re-play messages and reduces the chance of a message being lost.

The book ‘Kafka – The Definitive Guide’ provides a relatively sophisticated and technical introduction into Kafka. Read this book, if you want a light technical introduction into Kafka.

Here are other top tips and ideas I took from the book:

  • unit of data in Kafka is a ‘message’ (similar to a row or record)
  • Kafka batches messages together for efficiency
  • Schemas can be applied to messages so that they are easy to understand (e.g., JSON, XML)
  • Massages are categorised into ‘Topics’ (similar to table or folder)
  • Topics are broken down into ‘partitions’ (messages are only guaranteed within a partition)
  • Partitions can be hosted on different servers, which allow scaling horizontally
  • A producer creates new messages with a specific topic (not partition, unless you use a key to specify this)
  • A consumer reads messages from one or more Topics
  • Consumers keep track of message by recording the ‘offset’ (an integer in metadata which is added by a broker to each message in ascending order)
  • Consumers may be grouped together to consume from a topic
  • Brokers (a single Kafka server) get messages from producers, assigns an offset, commits them to a disk and then send the messages to consumers (when
  • Brokers are designed to work together in a ‘cluster’, data is not shared outside or available outside a cluster by default
  • Multiple clusters allow you to segregate types of data, isolation of data for security and disaster recovery
  • Sharing data between clusters can be done with MirrorMaker (included in Kafka project)
  • Top use cases for Kafka: activity tracking, messaging, metrics and logging, commit log, stream processing