Data at rest versus data in motion
Why is it important?
A critical component of data-driven applications
How it starts
fig: kafka-1, 2, 3
Overview
Messages
Batches
Schemas
Topics and partitions
fig: kafka-4
Producers and consumers
Producers
Consumers
Offsets
fig: kafka-5
Consumer group
fig: kafka-6
Brokers
Clusters
Leader and followers
fig: kafka-7
Retention
Multiple clusters
fig: kafka-8
What makes Kafka a good choice?
fig: kafka-9
Note:
Use cases
Cluster membership
Controller
A Quick Review:
Replication
Request processing
fig: kafka-10
fig: kafka-11
fig: kafka-12
fig: kafka-13
Physical storage
fig: kafka-14
Note: Zero-copy is an optimization in which Kafka sends data from disk to the network without multiple memory copies between user space and kernel space. Normally, sending data involves copying from disk to kernel buffers, then to user-space buffers, and back to kernel network buffers before transmission. With zero-copy (using OS features like Linux sendfile), Kafka can send messages directly from the on-disk segment files or the Linux page cache to the network socket. This reduces CPU and memory usage, speeds up message delivery, and works efficiently because Kafka’s on-disk segment format matches the wire protocol, so no data transformation is needed. The wire protocol is the format and rules that define how data is encoded and transmitted over the network between clients and servers.
fig: kafka-15
fig: kafka-16
fig: kafka-17