Apache Kafka
A distributed event streaming platform built for high-throughput, fault-tolerant, real-time data pipelines. This guide covers every concept from fundamentals to production failure scenarios.
Event-Driven Architecture (EDA)
Services don't call each other directly. Instead, a service emits events and other services react to those events.
What is an Event?
- Immutable — once created, it can never be changed. It's a fact that happened.
- Happened in the past — an event represents something that already occurred (OrderPlaced, PaymentCompleted).
- Self-contained — carries all the data needed to understand what happened. No need to query another service.
REST Microservices vs EDA Microservices
REST (Synchronous) Challenges
- Availability — all services must be up. If any one service is down, the entire API call fails.
- Latency — total latency = sum of all service latencies in the chain.
- Cascading Failure — if Service 3 is slow or failing, Service 2 backs up, then Service 1 goes down too.
- Tight Coupling — services know about each other, changes ripple across teams.
- Scaling Issue — scaling Service 3 has no benefit if Service 1 and 2 can only handle the same request rate.
EDA (Asynchronous) Advantages
- Loose Coupling — producer doesn't know or care about consumers.
- Independent Scaling — each service scales on its own based on its throughput.
- Resilience — temporary failures don't break the whole system. Events wait in the broker.
- Replay — you can replay old events for recovery, debugging, or new consumer onboarding.
- Latency Improvement — fire-and-forget pattern. Producer doesn't block waiting for downstream.
Two Types of EDA
Push-Based (Pub/Sub)
- Broker pushes events to active consumers.
- Only cares about delivery.
- You cannot replay old events.
- Events published only to currently active subscribers.
- Examples: RabbitMQ, AWS SNS, Google Pub/Sub.
Pull-Based (Streaming / Kafka)
- Consumer pulls events at its own pace.
- Events are appended to logs — persisted on disk.
- You can replay events from any offset.
- Retention-based: events stay forever or until configured TTL.
- Examples: Apache Kafka, Amazon Kinesis, Redpanda.
What is Apache Kafka?
A distributed event streaming platform that allows you to publish, store, and subscribe to streams of events in real-time.
In simple words: Kafka is like a super-fast, never-forgetting diary. Applications write events (things that happened) into this diary, and other applications can read those events whenever they want — even days later. The diary can handle millions of entries per second and never loses a page.
Publish Events
Producers write events (messages) to Kafka topics. Events are key-value pairs with optional headers and a timestamp. Producers choose which topic and optionally which partition.
Store Events
Events are durably stored on disk across multiple brokers with configurable replication. Kafka can retain events for hours, days, or forever. Storage is sequential and immutable.
Subscribe to Events
Consumers pull events from topics at their own pace. Multiple consumer groups can independently read the same data. Each group tracks its own position (offset) in the log.
Producer
An application that publishes events to Kafka. The producer does not care who will consume the event. It's fire-and-forward.
In simple words: A producer is like a news reporter. They write the news story and send it to the newspaper office (Kafka). They don't know or care who reads the newspaper — that's not their job. Their job is just to report what happened.
Producer Record Structure
Producer ACK Configurations
The producer can control how many acknowledgements it waits for before considering the write successful.
acks = 0 (Fire & Forget)
Producer sends the record and does NOT wait for any response. Fastest but data can be lost if broker crashes before writing. Use for metrics/logs where loss is acceptable.
acks = 1 (Leader Only)
Producer waits for the leader broker to write to its local log and acknowledge. Balanced between speed and safety. Data can be lost if leader crashes before followers replicate.
acks = all (All ISR)
Producer waits for the leader AND all in-sync replicas (ISR) to write the event. Slowest but safest. No data loss as long as at least one ISR is alive. Production recommended.
Topic
A Topic is a logical category or folder to which events are published. Think of it as a named stream. Many producers can write to the same topic, and many consumer groups can read from the same topic independently.
In simple words: A topic is like a TV channel. Channel "order-events" shows all order-related news. Channel "payment-events" shows all payment-related news. Producers broadcast to a channel, and consumers tune into whichever channels they care about.
Topic Properties
| Property | Description | Default |
|---|---|---|
| num.partitions | Number of partitions when topic is created | 1 |
| replication.factor | How many copies of each partition across brokers | 1 (set to 3 in prod) |
| retention.ms | How long to keep events before deletion | 604800000 (7 days) |
| retention.bytes | Max size of partition log before old segments are deleted | -1 (unlimited) |
| cleanup.policy | Whether to delete or compact old events | delete |
| min.insync.replicas | Minimum ISR count for acks=all writes to succeed | 1 (set to 2 in prod) |
| segment.bytes | Size of each segment file in the partition log | 1073741824 (1 GB) |
Internal Topics (created by Kafka itself)
| Topic Name | Partitions | Purpose |
|---|---|---|
| __consumer_offsets | 50 (default) | Stores committed offsets for every consumer group. Keyed by (groupId, topic, partition). |
| __transaction_state | 50 (default) | Stores transaction metadata for exactly-once semantics. |
| _cluster_metadata.log | 1 | KRaft cluster metadata log (replicated across controllers). |
Partitions
A partition is a physical, ordered, append-only log that is part of a topic. It's where events are actually stored on disk.
In simple words: If a topic is a book, then partitions are the chapters. The book "order-events" might have 3 chapters (partitions). Each chapter is stored on a different shelf (broker). Events are written at the end of a chapter — you never insert pages in the middle. This makes writing super fast.
Three Key Properties
Physical
Events are actually stored on disk inside the partition as log files. Each partition maps to a directory on the broker's filesystem containing segment files. Example: /data/order-events-0/
Ordered
Events are written sequentially and assigned an incrementing offset (starting from 0). Consumers read in offset order. Within a single partition, order is guaranteed. Kafka never re-orders events inside a partition.
Append-Only
Events are only appended to the end of the log. No insert in the middle. No updates. No deletes of individual records. The log only grows forward. This is what makes Kafka's sequential disk I/O so fast.
Partitioning Strategies
Producer chooses which partition of a topic the event goes to. This decision impacts ordering guarantees and load distribution.
Key-Based Partitioning
partition = hash(key) % num_partitions
All events with the same key go to the same partition. This guarantees ordering for related events. Example: all events for orderId "123" always go to the same partition.
Ordering Guaranteed Possible Hotspots
Round Robin (No Key)
When key = null, events are distributed in round-robin across partitions.
E0 → P0, E1 → P1, E2 → P2, E3 → P0 ... Even distribution but ordering is NOT guaranteed. Related events may land in different partitions.
Even Distribution No Ordering
Custom Partitioner
You implement your own Partitioner interface with business logic.
Example: country == "India" → P0, country == "US" → P1. Gives full control over data locality and processing affinity.
Full Control Custom Logic
Segments & Indexing
Each partition log file is further divided into segments. Instead of one massive log file, Kafka splits it into smaller, manageable pieces for faster reads and efficient cleanup.
In simple words: Imagine writing a very long diary. Instead of one 10,000-page book, you split it into volumes: Volume 1 (pages 1-500), Volume 2 (pages 501-1000), etc. When someone asks for page 750, you grab Volume 2 directly instead of flipping through the entire diary. That's what segments do for Kafka.
Why Segments?
If a consumer says "read from offset 550 to 800 of Partition 0", Kafka can skip directly to the right segment file instead of scanning from the beginning. Also, retention and compaction operate at the segment level — Kafka deletes or compacts entire segment files.
Sparse Index (.index files)
Even with segments, a 1 GB segment file would still need sequential scanning. So Kafka maintains a sparse index per segment. Not every offset is indexed. Instead, it creates one index entry after every N bytes written (configured by log.index.interval.bytes, default 4096 bytes).
Segment Log (00000000.log)
| Offset | Event Size | File Position |
|---|---|---|
| 0 | 300 bytes | 0 |
| 1 | 500 bytes | 300 |
| 2 | 1000 bytes | 800 |
| 3 | 2000 bytes | 1800 |
| 4 | 500 bytes | 3800 |
Total bytes: 300+500+1000+2000+500 = 4300 bytes
Sparse Index (00000000.index)
| Offset | Position (bytes) |
|---|---|
| 4 | 3800 |
Index interval = 4096 bytes. After 4300 bytes written (> 4096), Kafka records offset 4 at position 3800. To find offset 3, Kafka looks at index, finds nearest lower entry, then scans sequentially from there.
Broker
A Broker is a single Kafka server instance. It's the one that actually stores data on disk and serves client requests (both producers and consumers).
In simple words: A broker is like a warehouse worker. Each worker (broker) manages some shelves (partitions). Producers give packages (events) to the right worker, and consumers ask the right worker for their packages. Multiple workers together form a team (cluster).
Broker Responsibilities
- Store partition data — persist events to disk in segment files.
- Serve producer writes — accept new events and write to leader partitions.
- Serve consumer reads — return events from leader partitions based on offset.
- Replicate data — if hosting a follower partition, continuously fetch from the leader to stay in sync.
- Send heartbeats — report liveness to the controller.
- Serve metadata requests — tell clients which broker hosts which partition leader.
Broker Networking
Kafka uses a custom binary protocol over TCP. Default port is 9092. Each broker has a unique broker.id. Clients connect to any broker (bootstrap) to discover the full cluster topology, then connect directly to the partition leaders they need.
Kafka Cluster
A Kafka Cluster is a group of brokers working together to provide scalability, fault tolerance, and high availability.
Scalability
Distribute load across multiple servers. More brokers = more partitions = more throughput. Producers and consumers talk to different brokers in parallel.
Fault Tolerance
If a broker goes down, its partitions have replicas on other brokers. Data is never lost (with replication factor >= 2). Automatic failover via leader election.
High Availability
No single point of failure. Controller has standby nodes. Every partition has replicas. Clients automatically reconnect to new leaders. System continues even if minority of brokers fail.
Leader-Follower Partition Replication
For each partition, one broker is the Leader and the rest are Followers. This is the core of Kafka's fault tolerance.
In simple words: Imagine a group project. One person (leader) does the original work and everyone (producers/consumers) talks to them. The other members (followers) silently make copies of the leader's work. If the leader gets sick, one of the followers who has a complete copy can immediately take over. No work is lost.
Leader Responsibilities
- Handle ALL producer writes for that partition.
- Handle ALL consumer reads for that partition.
- Maintain partition log (segments, indexes).
- Coordinate with followers — track their replication progress.
- Manage ISR list — add/remove followers based on sync status.
Follower Responsibilities
- Continuously fetch (pull) data from the leader.
- Replicate data to stay in sync.
- Ready to become leader if current leader fails.
- Do NOT serve any client (producer/consumer) requests directly.
- Acknowledge successful replication back to leader (for acks=all).
ISR (In-Sync Replicas)
ISR is the list of replicas (including the leader) that are fully caught up with the leader's log. This is cluster metadata maintained by the Controller.
In simple words: Imagine you're a teacher (leader) and you have 2 students (followers) copying your notes. ISR is the list of students who are up-to-date with your notes. If a student falls behind (didn't copy the last 5 pages), they're temporarily removed from the ISR until they catch up.
How Does the Leader Decide a Follower is Out-of-Sync?
- A follower is "in-sync" if its replica offset is close enough to the leader's log-end offset.
- Specifically, if the follower hasn't fetched within replica.lag.time.max.ms (default 30 seconds), it's removed from ISR.
- If a follower stops fetching entirely, or falls too far behind, the leader requests the Controller to shrink the ISR.
- When the follower catches up again, it's added back to the ISR.
Why ISR Matters
When producer uses acks=all, the leader waits for ALL replicas in the ISR (not all replicas, just the in-sync ones) to acknowledge the write. This means:
- If ISR = {Broker1, Broker2, Broker3} and acks=all, leader waits for all 3.
- If Broker3 is removed from ISR, leader only waits for Broker1 + Broker2.
- This prevents a slow follower from blocking all writes.
Controller
The Controller is a special broker responsible for managing all cluster metadata. It's the brain of the Kafka cluster.
In simple words: The controller is like the manager of the warehouse. Workers (brokers) store packages, but the manager decides which worker handles which shelf, who takes over when a worker is sick, and keeps track of everything. There's only one active manager at a time, but backup managers are ready to step in.
Controller Responsibilities
Topic Management
Creates/deletes topics. Decides how partitions are distributed across brokers when a new topic is created.
Partition Leader Election
Elects leader and follower replicas for every partition. Re-elects leaders when a broker fails.
Failure Detection
Monitors broker heartbeats. Detects when brokers go down and triggers recovery actions.
Metadata Distribution
Notifies all brokers about changes — new topics, leader changes, ISR updates. Brokers cache this metadata to serve client requests.
Controller can have dual roles
- Dual responsibility: Normal Broker + Controller role (common in smaller clusters).
- Dedicated Controller: Only handles controller duties, no partition storage (recommended for large clusters).
KRaft (Kafka Raft Consensus)
KRaft is the built-in consensus mechanism that replaced ZooKeeper (deprecated in Kafka >= 3.x). It uses the Raft consensus algorithm to coordinate multiple controllers.
In simple words: Imagine 3 managers in a company. Only 1 can be the CEO at a time. When the current CEO leaves, the remaining managers vote to pick a new CEO. That voting process is "consensus." KRaft is Kafka's built-in voting system. Before KRaft, Kafka used an external voting system called ZooKeeper (like hiring an outside agency to run the election).
Why Do We Need Consensus?
- A single controller is a bottleneck — if it dies, the whole cluster is stuck.
- So we run multiple controllers (typically 3 or 5).
- But who decides which controller is the leader? We can't use another controller (infinite loop).
- Answer: the controllers elect a leader among themselves using the Raft consensus algorithm.
KRaft vs ZooKeeper
ZooKeeper (Legacy)
- External distributed system.
- Deployed separately.
- Monitored separately.
- Maintained separately from Kafka.
- Uses ZAB consensus protocol.
- Additional operational overhead.
- Deprecated in Kafka 3.x+.
KRaft (Modern)
- Built-in to Kafka.
- No external dependency.
- Single system to deploy and monitor.
- Uses Raft consensus protocol.
- Less operational overhead.
- Faster failover.
- Default since Kafka 3.3+.
Quorum
Quorum means majority of nodes must agree before any decision is finalized.
| Controller Nodes | Quorum (N/2 + 1) | Can Tolerate Failures |
|---|---|---|
| 3 | 2 | 1 |
| 5 | 3 | 2 |
| 7 | 4 | 3 |
KRaft Metadata Commit Flow
Producer or Admin CLI sends a request (e.g., create topic) to any broker, which forwards it to the active controller.
Creates the topic/partitions, decides leader/followers, creates a metadata record with the next offset (e.g., offset 100). Writes it to its local _cluster_metadata.log but marks it as NOT committed yet.
Active controller sends the new metadata record to standby controllers via heartbeats.
Each standby appends the record to their local _cluster_metadata.log (not committed yet) and sends ACK back.
Once a majority (quorum) of controllers have written successfully, the active controller marks the record as committed. Updates last committed offset to 100.
Active controller sends heartbeats with the last committed offset to all brokers. Brokers fetch and apply the new metadata.
When standby controllers receive the next heartbeat with the committed offset, they also mark the record as committed in their local log.
Consumer & Consumer Groups
A Consumer is an application that reads (pulls) events from Kafka. Kafka uses a PULL model — the consumer asks for data, Kafka never pushes.
In simple words: A consumer is like a newspaper reader. Kafka doesn't throw the newspaper at your door — YOU go and pick it up when you're ready. A Consumer Group is a team of readers sharing the workload. If there are 3 newspaper sections, 3 readers can each take 1 section and read in parallel. But two readers from the same team never read the same section.
Consumer Groups
Every consumer belongs to a group.id. Within a group, work is divided. Across different groups, the same data is read independently.
Partition Assignment Scenarios
Consumers == Partitions (Ideal)
Topic: 3 partitions, Group: 3 consumers
C2 → P1
C3 → P2
Perfect Balance
Consumers < Partitions
Topic: 6 partitions, Group: 3 consumers
C2 → P1, P4
C3 → P2, P5
Each consumer handles more load
Consumers > Partitions
Topic: 3 partitions, Group: 5 consumers
C2 → P1
C3 → P2
C4 → IDLE
C5 → IDLE
Wasted consumers
Multiple Consumer Groups
The same partition CAN be read by multiple consumers that are in different groups. Each group independently tracks its own offset.
Offset Management
Each consumer group maintains its partition-wise committed offset independently. This tells Kafka "I've processed up to this point."
In simple words: An offset is like a bookmark in a book. It tells you "I've read up to page 105." If you put the book down and come back later, you start from page 106. Each reader (consumer group) has their own bookmark — so one reader might be on page 105 while another is on page 60.
Offset Tracking Structure
| Consumer Group | Topic | Partition | Committed Offset |
|---|---|---|---|
| notification | order-events | 0 | 105 |
| notification | order-events | 1 | 98 |
| notification | order-events | 2 | 210 |
| analytics | order-events | 0 | 60 |
| analytics | order-events | 1 | 45 |
Where Are Offsets Stored?
Offsets are stored in the __consumer_offsets internal topic (50 partitions by default). The partition is determined by:
Offset Commit Strategies
How and when the consumer tells Kafka "I've successfully processed up to this offset" — this determines your delivery guarantee.
Strategy 1: Auto-Commit (Risky)
Events 51-99 are LOST. At-most-once delivery.
Strategy 2: Manual Commit (Safe)
If crash before commit: events reprocessed (duplicates). At-least-once delivery.
Complete Producer Write Flow
End-to-end flow from event creation to acknowledgement.
{ topic: "order-events", key: "order-123", value: orderJson, acks: all }
Connects to any broker (bootstrap server). Broker either has cached metadata or fetches from the active controller. Returns: which broker is leader for each partition of each topic.
partition = hash("order-123") % 3 = 1
From metadata: Partition 1 leader = Broker 2
No intermediary. Direct TCP connection to the partition leader.
Appends to active segment file. Assigns next offset (e.g., offset 101). Writes to OS page cache (async flush to disk).
Follower brokers continuously poll the leader. They fetch the new event, write to their own partition logs, and ACK back to the leader.
Once all in-sync replicas confirm the write, the leader sends a success response back to the producer.
Complete Consumer Read Flow
End-to-end flow from consumer startup to continuous event processing.
group.id = "notification-service"
hash("notification-service-group-id") % 50 = 23 → Partition 23 of __consumer_offsets
Asks any broker: "Who is the leader of __consumer_offsets Partition 23?" Answer: Broker 3.
Sends JoinGroup request. Broker 3 manages membership, waits for all group members, then assigns partitions. Response: "You handle Partition 2 of order-events."
The assignment is written to __consumer_offsets Partition 23 and replicated to all followers. Internally acts like acks=all (no config option for this).
Asks Group Coordinator: "What's my last offset for order-events Partition 2?" Coordinator looks up key "notificationGroupId_order-events_Partition-2" and returns offset 100.
Checks metadata: leader of order-events Partition 2 = Broker 2. Sends FetchRequest: "Give me events from offset 101, max 200 bytes." Broker 2 returns events 101-501.
Application logic runs on each event sequentially.
Sends OffsetCommit to Group Coordinator: "I've processed up to offset 501 for order-events Partition 2." Coordinator writes to __consumer_offsets.
Consumer repeats from step 7. This is the consumer's poll loop — continuously fetching, processing, and committing.
Log Compaction & Retention
Kafka can't keep data forever (disk fills up). Two strategies control how old data is cleaned up.
In simple words: Think of your phone gallery. Delete policy = delete all photos older than 30 days. Compact policy = for each person, keep only the most recent photo and delete older ones. Both free up space, but in different ways.
Policy 1: Delete (Default)
Deletes entire old segment files based on:
- Time-based: Segment created 8 days ago → DELETED. Segment created 6 days ago → KEPT.
- Size-based: If total partition log exceeds retention.bytes, oldest segments are deleted until under the limit.
Deletion is async — does not block writes. Runs periodically in background.
Policy 2: Compact
Keeps only the latest value for each key. Old duplicates are removed.
| Before | After Compaction | |
|---|---|---|
| off:100 user1 → v1 | → | removed |
| off:101 user2 → v1 | → | removed |
| off:102 user1 → v2 | → | KEPT |
| off:103 user3 → v1 | → | KEPT |
| off:104 user2 → v2 | → | KEPT |
Used for: state stores, CDC, config topics. Compaction is async, per-segment, does not block writes.
Why Kafka is Fast (Despite Using Disk)
Kafka reads and writes to DISK. Disk I/O is slow, right? Then how is Kafka so fast? Two fundamental OS-level optimizations.
Page Cache (Write Optimization)
- Kafka trusts the OS to cache log files efficiently.
- Write goes to RAM (page cache), returns immediately.
- OS flushes to disk asynchronously in the background.
- Because Kafka is append-only, writes are always sequential.
- No insert in middle, no random access, no updates — disk head keeps moving forward.
- Sequential disk writes can be as fast as RAM in many cases.
Zero-Copy (Read Optimization)
- Kafka uses sendfile() system call.
- Data goes directly from disk/page cache to the network socket.
- No copy to Kafka JVM heap. No serialization/deserialization.
- Saves CPU cycles, memory, and time.
- This is why Kafka can serve consumers at network speed, not application speed.
Additional Performance Factors
Batching
Producers batch multiple events into a single network request. Configurable via batch.size and linger.ms. Reduces network overhead dramatically.
Compression
Entire batches can be compressed (gzip, snappy, lz4, zstd). Reduces network bandwidth and disk usage. Decompressed only at the consumer.
Partitioned Parallelism
Multiple partitions allow multiple consumers to read in parallel. More partitions = more parallelism = more throughput. Each partition is an independent log.
Edge Cases & Failure Scenarios
What actually happens when things go wrong in a Kafka cluster. These are common interview questions.
1. Active Controller Fails
2. Leader Partition Broker Fails
3. Follower Partition Broker Fails
4. What if a "Topic Fails"?
5. Consumer Fails Before Committing Offset
6. Consumer Processing Takes Too Long
Solutions
- Increase max.poll.interval.ms
- Reduce max.poll.records (process smaller batches)
- Optimize processing logic (faster processing)
- Use a separate thread pool for processing (decouple poll from process)
7. Entire Broker Fails
8. What if ISR List Becomes Empty?
EDA Challenges & When to Use Kafka
Challenges in Event-Driven Architecture
Eventually Consistent
Data is not instantly synchronized across services. After an event is published, there's a delay before all consumers process it. Not suitable when strict real-time consistency is required.
Duplicate Events
At-least-once delivery means events can be processed more than once. Consumers must be idempotent — processing the same event twice should produce the same result.
Ordering Problem
Across partitions, ordering is not guaranteed. The 2nd event can be processed before the 1st if they're in different partitions. Use key-based partitioning for related events.
Schema Evolution
If the producer changes the event schema (adds/removes fields), all consumers may break. Use a Schema Registry (Avro, Protobuf) with backward/forward compatibility rules.
Debugging Complexity
No single request-response trace. Events flow through multiple services asynchronously. Requires distributed tracing (correlation IDs, OpenTelemetry) for end-to-end visibility.
Poison Messages
One bad message can block an entire partition's consumer. If processing throws an unhandled exception, the consumer retries infinitely. Solution: Dead Letter Queue (DLQ) for failed events.
Operational Overhead
Must monitor: consumer lag, throughput, partition balance, broker health, disk usage. More infrastructure to manage compared to simple REST calls.
Exactly-Once is Hard
Achieving exactly-once processing requires idempotent producers + transactions + consumer-side deduplication. Kafka supports it but adds complexity and latency overhead.
When to Use Kafka
Long-running workflows
Processes that take minutes/hours. Decouple the trigger from the execution.
Eventual consistency is acceptable
When millisecond consistency isn't required between services.
Real-time analytics & monitoring
Stream processing, dashboards, alerting, metrics aggregation.
Audit logs & event sourcing
Immutable log of everything that happened. Replay for debugging or rebuilding state.
Decoupled microservices
Services communicate without knowing about each other. Independent deployment and scaling.
High-throughput data pipelines
ETL, CDC (Change Data Capture), log aggregation, clickstream processing.
Idempotent Producer
What happens when a producer sends an event but the network fails before it gets the ACK? The producer retries — and now Kafka has the same event twice. Idempotent producer solves this.
The Problem — In Simple Words
Imagine you're ordering food online. You click "Place Order" but the page freezes. You're not sure if the order went through, so you click again. Now the restaurant has 2 orders for the same meal. That's exactly what happens with Kafka producers without idempotency.
The Solution — Idempotent Producer
Turn on one config and Kafka handles it for you. Each producer gets a unique ID, and every event gets a sequence number. If the broker sees the same sequence number twice, it ignores the duplicate.
What Configs Change Automatically?
| Config | Value When Idempotence = true | Why |
|---|---|---|
| acks | all | Must confirm write on all ISR replicas |
| retries | Integer.MAX_VALUE | Keep retrying until success |
| max.in.flight.requests.per.connection | 5 (max) | Allows up to 5 unacknowledged requests but maintains order using sequence numbers |
Transactions & Exactly-Once Semantics (EOS)
Idempotent producer prevents duplicates within one partition. But what if you need to write to multiple partitions or topics as ONE atomic operation? That's where Transactions come in.
In Simple Words
Think of a bank transfer: debit from Account A and credit to Account B. Both must happen together — either BOTH succeed or BOTH fail. You can't debit A and then crash before crediting B. Kafka Transactions work the same way — all writes in a transaction either ALL go through or NONE of them do.
When Do You Need Transactions?
Read-Process-Write Pattern
Read from Topic A, process the data, write to Topic B, and commit the offset for Topic A — all as ONE atomic operation. If any step fails, everything rolls back.
Multi-Partition Writes
Writing events to 3 different partitions at once. Either all 3 partitions get the event, or none of them do. No partial writes.
Multi-Topic Writes
Writing an order event to "orders" topic AND a payment event to "payments" topic. Both must succeed together.
How It Works (Step by Step)
Consumer Side — Isolation Levels
read_uncommitted (Default)
- Consumer sees ALL events, even from incomplete transactions.
- Faster but may read data that later gets rolled back.
- Use when you don't care about transactional guarantees.
read_committed
- Consumer only sees events from committed transactions.
- Slightly slower (waits for commit markers).
- Use when you need exactly-once guarantees.
Consumer Rebalancing
When consumers join or leave a group, Kafka redistributes partitions among the remaining consumers. This process is called rebalancing.
In Simple Words
Imagine 3 workers splitting 6 tasks equally (2 each). If one worker goes home sick, the remaining 2 workers must split all 6 tasks (3 each). If a new worker joins, tasks are redistributed again. That's rebalancing.
What Triggers a Rebalance?
New Consumer Joins
A new consumer starts with the same group.id. Partitions are redistributed to include the new member.
Consumer Leaves or Crashes
A consumer calls close() or stops sending heartbeats. Its partitions must be given to someone else.
Topic Changes
New partitions are added to a subscribed topic, or the consumer subscribes to a new topic.
Rebalancing Process (Step by Step)
Group Coordinator detects a change (new member, missed heartbeat, etc.).
Every consumer in the group stops reading. They commit their current offsets and release their partitions.
All consumers send a JoinGroup request to the Group Coordinator. The first consumer to join becomes the Group Leader (not the same as partition leader).
The Group Leader runs the partition assignment strategy (Range, RoundRobin, or Sticky) and sends the new assignment back to the Coordinator.
Coordinator distributes the new partition assignment to all consumers. Each consumer starts reading from their newly assigned partitions.
Assignment Strategies
| Strategy | How It Works | Best For |
|---|---|---|
| Range | Divides partitions of each topic into contiguous ranges per consumer. Consumer 1 gets P0-P1, Consumer 2 gets P2-P3. | When you need co-partitioning (same consumer reads same partition number from different topics). |
| RoundRobin | Distributes all partitions one-by-one across consumers. P0 → C1, P1 → C2, P2 → C3, P3 → C1... | When you want even distribution across consumers. |
| Sticky | Like RoundRobin but tries to keep previous assignments. Only moves partitions that must move. | Reducing rebalance disruption. Recommended for most cases. |
| CooperativeSticky | Incremental rebalancing — only revokes partitions that need to move, others keep reading. | Minimizing downtime during rebalancing. Best choice. |
Dead Letter Queue (DLQ)
What happens when a consumer can't process an event? A bad event (poison message) can block the entire partition. DLQ is the solution.
In Simple Words
Imagine a post office. Most letters are delivered fine. But some letters have wrong addresses. Instead of throwing them away or blocking all other mail, the post office puts them in a separate "undeliverable" pile. Someone later checks that pile and fixes the addresses. A Dead Letter Queue is that "undeliverable" pile for Kafka events.
The Problem
The Solution — DLQ Pattern
How to Implement DLQ
Schema Registry
Kafka stores events as raw bytes. It doesn't know or care what's inside the event. But producers and consumers need to agree on the format. Schema Registry solves this.
In Simple Words
Imagine two people writing letters to each other. If person A starts writing in English and person B expects Hindi, they can't understand each other. Schema Registry is like a shared dictionary that both people agree to use. If person A wants to add a new word, the dictionary checks if person B can still understand the letter.
The Problem Without Schema Registry
How Schema Registry Works
When a producer starts, it registers the event schema (Avro, Protobuf, or JSON Schema) with the Schema Registry. The registry assigns a Schema ID (e.g., ID=1).
Instead of sending raw JSON, the producer sends: [Magic Byte][Schema ID=1][Serialized Data]. The data is compact (Avro binary, not JSON text).
Consumer reads the Schema ID from the event, fetches the schema from the Registry, and deserializes the data correctly. Schema is cached locally after the first fetch.
When the producer wants to change the schema, the Registry checks if the new version is compatible with older versions before allowing it.
Compatibility Modes
| Mode | What You Can Do | Simple Explanation |
|---|---|---|
| BACKWARD | Delete fields, add fields with defaults | New consumers can read old events. "I can understand old letters." |
| FORWARD | Add fields, delete fields with defaults | Old consumers can read new events. "Old readers can understand new letters." |
| FULL | Add/delete fields only with defaults | Both old and new consumers can read both old and new events. Safest. |
| NONE | Anything goes | No checks. Dangerous. Can break consumers at any time. |
Kafka Connect
A framework for moving data between Kafka and other systems without writing any code. Just configure and run.
In Simple Words
You want to move data from your MySQL database into Kafka. Or from Kafka into Elasticsearch. You could write a custom producer/consumer for each system. But that's a lot of repeated work. Kafka Connect gives you ready-made "connectors" — like USB adapters that plug different systems into Kafka.
Two Types of Connectors
Source Connector (Data INTO Kafka)
- Reads data from external systems.
- Converts it into Kafka events.
- Publishes to Kafka topics.
- Example: Every new row in MySQL automatically becomes a Kafka event.
Sink Connector (Data OUT OF Kafka)
- Reads events from Kafka topics.
- Writes data to external systems.
- Example: Every Kafka event automatically gets indexed in Elasticsearch.
Popular Connectors
| Connector | Type | What It Does |
|---|---|---|
| JDBC Source | Source | Reads rows from any SQL database (MySQL, PostgreSQL, Oracle) into Kafka |
| Debezium | Source | CDC (Change Data Capture) — captures every INSERT/UPDATE/DELETE from databases in real-time |
| S3 Sink | Sink | Writes Kafka events to Amazon S3 as files (Parquet, JSON, Avro) |
| Elasticsearch Sink | Sink | Indexes Kafka events into Elasticsearch for search |
| HDFS Sink | Sink | Writes Kafka events to Hadoop HDFS for batch analytics |
| MongoDB Source/Sink | Both | Reads from and writes to MongoDB collections |
Kafka Streams
A Java library for processing data in real-time directly from Kafka topics. No separate cluster needed — it runs inside your application.
In Simple Words
Normally, you read from Kafka, process data in your app, and write results back to Kafka. Kafka Streams makes this easier by giving you a powerful API to filter, transform, join, and aggregate events — all in real-time, as events flow through. Think of it like Excel formulas but for streaming data.
Why Not Just Use a Consumer?
Plain Consumer
- You write all processing logic yourself.
- You handle state management yourself.
- You handle fault tolerance yourself.
- You handle time windowing yourself.
- Joining two topics? Build it from scratch.
Kafka Streams
- Built-in operators: filter, map, join, aggregate.
- Automatic state management (RocksDB).
- Automatic fault tolerance and recovery.
- Built-in windowing (tumbling, hopping, sliding).
- Joining topics is one line of code.
Common Operations
Filter
Keep only events that match a condition. Example: only keep orders above $100.
Map / Transform
Change the event shape. Example: extract just the customer ID and amount.
Group & Aggregate
Group events by key and compute running totals. Example: total spending per customer.
Windowed Aggregation
Aggregate events within a time window. Example: count orders per 5-minute window.
Kafka Streams vs Other Stream Processors
| Feature | Kafka Streams | Apache Flink | Apache Spark Streaming |
|---|---|---|---|
| Deployment | Library (runs in your app) | Separate cluster | Separate cluster |
| Complexity | Simple — just add a JAR | Medium — needs cluster setup | Medium — needs cluster setup |
| Scaling | Add more app instances | Add more TaskManagers | Add more executors |
| Best For | Kafka-to-Kafka processing | Complex event processing, multiple sources | Batch + streaming hybrid |
| Language | Java / Kotlin only | Java, Python, SQL | Java, Python, Scala, SQL |
Kafka Security
By default, Kafka has no security at all. Anyone who can reach the broker can read/write any topic. In production, you need authentication, authorization, and encryption.
In Simple Words
Think of Kafka as a building. Without security: no doors, no locks, no ID checks. Anyone can walk in, read any document, and write on any whiteboard. Security adds: a front door with a lock (encryption), ID cards for entry (authentication), and rules about who can access which room (authorization).
Three Layers of Security
Encryption (SSL/TLS)
What: Encrypts data in transit between clients and brokers, and between brokers.
Simple: Like sending a letter in a sealed envelope instead of a postcard. Nobody can read the data while it's moving over the network.
Port 9093 (default for SSL)
Authentication (SASL)
What: Verifies WHO is connecting. Clients must prove their identity.
Options:
- SASL/PLAIN — Username + password (simple, use with SSL).
- SASL/SCRAM — Salted password (more secure than PLAIN).
- SASL/GSSAPI — Kerberos (enterprise, Active Directory).
- SASL/OAUTHBEARER — OAuth 2.0 tokens.
Authorization (ACLs)
What: Controls WHO can do WHAT on WHICH resource.
Simple: Even after proving your identity, you can only access what you're allowed to. The payment service can read "payments" topic but not "user-profiles" topic.
Kafka vs Other Message Systems
Kafka is not the only messaging system. Here's how it compares to other popular choices and when to pick which one.
Comparison Table
| Feature | Apache Kafka | RabbitMQ | AWS SQS | AWS SNS |
|---|---|---|---|---|
| Model | Pull-based log | Push-based queue | Pull-based queue | Push-based pub/sub |
| Message Retention | Configurable (days/forever) | Until consumed | Up to 14 days | No retention |
| Replay | Yes (any offset) | No | No | No |
| Ordering | Per partition | Per queue (with limits) | FIFO queues only | No ordering |
| Throughput | Millions/sec | Tens of thousands/sec | Nearly unlimited (managed) | Nearly unlimited (managed) |
| Consumer Groups | Built-in | Manual setup | One consumer per message | Fan-out to subscribers |
| Operational Cost | High (self-managed) / Medium (managed) | Medium | Low (fully managed) | Low (fully managed) |
| Best For | Event streaming, high throughput, replay needed | Task queues, routing logic, request-reply | Simple job queues, decoupling | Fan-out notifications |
When to Pick What?
Pick Kafka When
- You need to replay old events (debugging, new consumers).
- You need very high throughput (millions of events/sec).
- Multiple services need to read the same data independently.
- You need event sourcing or an immutable audit log.
- You need stream processing (real-time transformations).
Pick RabbitMQ When
- You need complex routing (topic exchanges, header-based routing).
- You need request-reply pattern (RPC over messages).
- You need message priority (urgent messages first).
- Your throughput is moderate (thousands/sec, not millions).
- You want simpler setup for basic task queues.
Pick SQS/SNS When
- You're on AWS and want fully managed (zero ops).
- You need a simple job queue (no streaming features needed).
- You want fan-out notifications (SNS → email, SMS, Lambda).
- You don't need replay, ordering, or consumer groups.
- You want pay-per-use pricing with no cluster management.
Real-World Use Cases
How big companies actually use Apache Kafka in production. These examples help you understand where Kafka fits in real systems.
E-Commerce Order Flow
Problem: When a customer places an order, many things must happen: payment processing, inventory update, shipping notification, email confirmation, analytics tracking. These should not be tightly coupled.
Each service processes the same order event independently. If the email service is slow, it doesn't affect payments.
Real-Time Analytics / Dashboards
Problem: You want a live dashboard showing website activity: page views, clicks, sign-ups — all updated in real-time.
Website events (clicks, views) → Kafka topic → Kafka Streams aggregates (counts per minute) → Results written to another Kafka topic → Dashboard reads from that topic.
Kafka Streams Real-time
Change Data Capture (CDC)
Problem: Your main database is MySQL. You want every change (INSERT/UPDATE/DELETE) to be available for search in Elasticsearch and analytics in Snowflake.
MySQL → Debezium (Kafka Connect Source) → Kafka topics → Elasticsearch Sink Connector → Elasticsearch. No custom code needed.
Kafka Connect Debezium
Log Aggregation
Problem: You have 500 servers generating logs. You need all logs in one place for monitoring and debugging.
Each server sends logs to a "logs" Kafka topic. A central service reads from the topic and stores them in Elasticsearch/Loki/S3. Kafka acts as a buffer so log spikes don't crash your storage system.
High Throughput Buffering
Fraud Detection
Problem: You need to detect suspicious transactions within seconds, not hours.
Transaction events → Kafka topic → Kafka Streams / Flink compares each transaction against rules (unusual amount, unknown location, rapid frequency) → Suspicious events written to "fraud-alerts" topic → Alert service blocks the transaction.
Low Latency Stream Processing
Microservices Communication
Problem: 20+ microservices need to share data without creating a tangled web of REST calls.
Each service publishes events about its domain (UserCreated, OrderPlaced, PaymentCompleted). Other services subscribe only to the events they care about. Services are fully decoupled — you can add, remove, or update services without affecting others.
Loose Coupling Scalable
Companies Using Kafka
| Company | How They Use Kafka | Scale |
|---|---|---|
| Created Kafka. Uses it for activity tracking, metrics, data pipelines. | 7+ trillion messages/day | |
| Netflix | Real-time monitoring, event sourcing, data pipeline between microservices. | 700+ billion messages/day |
| Uber | Real-time pricing, trip matching, driver tracking, analytics. | Trillions of messages/day |
| Spotify | Event delivery, logging, real-time recommendations. | Hundreds of billions/day |
| Walmart | Inventory management, order processing, real-time supply chain. | Billions of messages/day |
Setup: Running Kafka (Docker & Cloud)
Before writing code, you need a running Kafka cluster. You have two choices: run it locally with Docker, or use a managed cloud service.
Option 1: Cloud Kafka (Zero Setup)
If you don't want to install anything, use a managed Kafka service. They give you a Kafka cluster in the cloud — you just get a connection URL and start coding.
Confluent Cloud
Made by the creators of Kafka. Most popular managed Kafka. Free tier available (no credit card needed).
Free Tier Best Kafka Support
Link: confluent.cloud
AWS MSK (Managed Streaming for Kafka)
Amazon's managed Kafka. Good if you're already on AWS. Supports MSK Serverless (pay per use).
AWS Ecosystem Serverless Option
Link: aws.amazon.com/msk
Upstash Kafka
Serverless Kafka with a REST API. Generous free tier (10,000 messages/day). Perfect for learning and small projects.
Free Tier REST API
Link: upstash.com/kafka
Redpanda Cloud
Kafka-compatible but faster. Drop-in replacement — same Kafka client code works. Free tier available.
Free Tier Fastest
Link: redpanda.com/cloud
Option 2: Docker (Local Setup — Recommended for Learning)
This runs a full Kafka cluster on your machine. You need Docker Desktop installed.
Step 1: Create docker-compose.yml
Create a file called docker-compose.yml in your project root:
Step 2: Start Kafka
Step 3: Verify Kafka is Working
Project Setup (Node.js + TypeScript)
Setting up a TypeScript project with the KafkaJS library — the most popular Kafka client for Node.js.
Step 1: Initialize the Project
Update tsconfig.json
Step 2: Create the Kafka Client (Shared Config)
Create src/client.ts — This file creates a reusable Kafka connection that all other files will import.
Project Structure (What We'll Build)
Add Run Scripts to package.json
Admin: Create & Manage Topics
Before producing or consuming, you need to create topics. The Admin API lets you create, delete, and inspect topics programmatically.
Run It
Build a Producer
The producer sends events to Kafka topics. We'll build it step-by-step, explaining every line and every option.
Run It
Build a Consumer
The consumer reads events from Kafka. It joins a consumer group, gets partitions assigned, and processes events in a loop.
Run It
Advanced Patterns
Production-ready patterns: batch sending, Dead Letter Queue, and transactions.
1. Batch Producer (High Throughput)
src/advanced/producer-batch.ts — Send many events at once for better performance.
2. Consumer with Dead Letter Queue
src/advanced/consumer-dlq.ts — Retry failed events and move permanently failed ones to a DLQ topic.
3. Transactions (Exactly-Once Processing)
src/advanced/transaction.ts — Read from one topic, process, write to another topic, and commit offset — all as ONE atomic operation.
Quick Reference: Run Commands
| Command | What It Does | Run Order |
|---|---|---|
| docker compose up -d | Start Kafka + UI containers | First (once) |
| npm run admin | Create topics (order-events, DLQ, confirmations) | Second (once) |
| npm run consumer | Start consumer (keeps running) | Third (Terminal 1) |
| npm run producer | Send order events | Fourth (Terminal 2) |
| npm run batch | Send 100 events at once | Anytime |
| npm run dlq | Consumer with retry + DLQ | Anytime |
| npm run txn | Exactly-once transaction consumer | Anytime |
Multi-Broker Docker Setup (3 Brokers)
A single broker is fine for learning, but production needs at least 3 brokers for fault tolerance. If one broker dies, the other two keep serving data with zero downtime.
In Simple Words
One broker = one copy of your data. If it dies, everything is gone. Three brokers = three copies of your data on three different machines. Even if one machine catches fire, the other two have a full copy and keep working. That's why production always uses 3+ brokers.
docker-compose.yml (3 Brokers + KRaft + Kafka UI)
Start the 3-Broker Cluster
Update client.ts for Multi-Broker
The only change in your code — list all 3 broker addresses:
Test Fault Tolerance — Kill a Broker
Production-Grade Producer
The basic producer works, but production needs retry handling, error callbacks, graceful shutdown, and proper batching configs. Here's a production-ready producer.
Production-Grade Consumer
A production consumer needs manual commits, rebalance listeners, error handling, DLQ integration, and graceful shutdown.
Monitoring & Health Checks
In production, you need to monitor consumer lag, broker health, and throughput. Here's how to build monitoring into your Node.js app.
In Simple Words
Imagine a factory assembly line. "Consumer lag" is how many boxes are piled up on the belt, waiting to be picked up. If the pile grows, your workers (consumers) are too slow. You need to either speed them up or add more workers.
Check Consumer Lag Programmatically
Health Check Endpoint
Add a simple HTTP health check so Kubernetes knows your consumer is alive:
Key Metrics to Monitor
| Metric | What It Means | Alert When |
|---|---|---|
| Consumer Lag | Events waiting to be processed | Lag > 10,000 or growing continuously |
| Rebalance Rate | How often partitions are reassigned | More than 5 rebalances / hour |
| Fetch Latency | Time to fetch a batch from Kafka | > 5 seconds consistently |
| Commit Latency | Time to commit an offset | > 3 seconds consistently |
| Under-Replicated Partitions | Partitions where ISR < replication factor | Any value > 0 |
| Offline Partitions | Partitions with no active leader | Any value > 0 (critical!) |
| DLQ Message Count | Failed events in the Dead Letter Queue | Any increase (investigate immediately) |
| Consumer Group State | Stable, Rebalancing, Empty, Dead | Anything other than "Stable" |
Production Deployment Checklist
Everything you need to verify before going to production. This is the summary of everything we've learned, as a checklist.
Cluster Configuration
| Config | Dev Value | Production Value | Why |
|---|---|---|---|
| Broker Count | 1 | 3+ (odd number) | Fault tolerance. Odd for clean quorum majority. |
| replication.factor | 1 | 3 | Survive 2 broker failures without data loss. |
| min.insync.replicas | 1 | 2 | With RF=3, at least 2 must confirm writes. |
| auto.create.topics.enable | true | false | Topics must be created deliberately with correct configs. |
| Controller Nodes | 1 (combined) | 3 dedicated | Dedicated controllers = faster metadata operations. |
| log.retention.hours | 168 (7 days) | Based on need | Balance between replay ability and disk cost. |
Producer Configuration
| Config | Dev Value | Production Value | Why |
|---|---|---|---|
| acks | 1 | -1 (all) | All ISR replicas must confirm. No data loss. |
| idempotent | false | true | Prevents duplicate events on retry. |
| compression | none | GZIP or ZSTD | Reduces network bandwidth and disk usage 60-80%. |
| retries | 0 | MAX_INT | Auto-set by idempotent=true. Never give up. |
| maxInFlightRequests | 5 | 5 | Max for idempotent. Sequence numbers maintain order. |
| Message Key | null | Business key | orderId, userId, etc. Guarantees ordering per key. |
| Graceful Shutdown | None | SIGTERM handler | Flush pending events before exit. |
Consumer Configuration
| Config | Dev Value | Production Value | Why |
|---|---|---|---|
| autoCommit | true | false | Manual commit after processing. Prevents data loss. |
| fromBeginning | true | false | Only read new events. Don't reprocess millions of old events. |
| sessionTimeout | 30s | 30s | Balance between detection speed and false alarms. |
| heartbeatInterval | 3s | 3s (1/3 of session) | Regular heartbeats prevent false death detection. |
| Instances Count | 1 | = partition count | 1 consumer per partition = max parallelism. |
| DLQ | None | topic.DLQ | Failed events go to DLQ, not blocking the queue. |
| Health Check | None | /health + /ready | Kubernetes needs HTTP endpoints to manage pods. |
| Graceful Shutdown | None | SIGTERM handler | Commit offsets and leave group cleanly on deploy. |
Operational Checklist
Before Launch
- All topics created manually with correct partition count and RF=3.
- SSL/TLS enabled for all connections.
- SASL authentication configured per service.
- ACLs restrict topic access (service A can't read service B's topics).
- Schema Registry configured with BACKWARD compatibility.
- DLQ topics created for every consumer topic.
- Monitoring dashboards set up (consumer lag, broker health).
- Alerts configured (lag > threshold, offline partitions, DLQ growth).
During Operation
- Monitor consumer lag trends (growing = problem).
- Watch rebalance frequency (too many = consumers are unstable).
- Review DLQ events daily (fix root causes, replay fixed events).
- Check under-replicated partitions (ISR < RF = broker may be failing).
- Track disk usage on brokers (set up alerts at 80% capacity).
- Test disaster recovery: periodically kill a broker, verify auto-recovery.
Scaling Decisions
- Consumer lag growing? Add more consumer instances (up to partition count).
- Need more parallelism? Add more partitions (can't reduce later!).
- Disk full? Add more brokers or reduce retention period.
- High latency? Check compression, batch size, fetch size configs.
- Too many rebalances? Use CooperativeSticky assignor, tune timeouts.
- Network bottleneck? Enable compression (ZSTD is fastest with good ratio).