Kafka
Last updated
Was this helpful?
Last updated
Was this helpful?
You can scale kafka horizontally by adding more nodes that run your kafka brokers
Event streaming
If you want to build a system where one process produces
events that can be consumed by multiple consumers
Examples of apps
Payment notifications
Cluster and broker
A group of machines running kafka are known as a kafka clusterEach individual machine is called a broker
Producers
As the name suggests, producers are used to publish
data to a topic
Consumers
As the name suggests, consumers consume from a topic
Topics
A topic is a logical channel to which producers send messages and from which consumers read messages.
Offsets
Consumers keep track of their position in the topic by maintaining offsets, which represent the position of the last consumed message. Kafka can manage offsets automatically or allow consumers to manage them manually.
Retention
Kafka topics have configurable retention policies, determining how long data is stored before being deleted. This allows for both real-time processing and historical data replay.
Ref - https://kafka.apache.org/quickstart
Using docker
Get shell access to container
Create a topic
Publish to the topic
Consuming from the topic
Ref https://www.npmjs.com/package/kafkajs
Initialise project
Update package.json
Add src/index.ts
Update package.json
Start the process
producer.ts
consumer.ts
Update package.json
Try starting multiple consumers, and see if each gets back a message for the messages produced
Notice we specified a consumer group
(my-app3)
Consumer group
A consumer group is a group of consumers that coordinate to consume messages from a Kafka topic.
Purpose:
Load Balancing: Distribute the processing load among multiple consumers.
Fault Tolerance: If one consumer fails, Kafka automatically redistributes the partitions that the failed consumer was handling to the remaining consumers in the group.
Parallel Processing: Consumers in a group can process different partitions in parallel, improving throughput and scalability.
Partitions
Partitions are subdivisions of a Kafka topic. Each partition is an ordered, immutable sequence of messages that is appended to by producers. Partitions enable Kafka to scale horizontally and allow for parallel processing of messages.
How is a partition decided?
When a message is produced to a Kafka topic, it is assigned to a specific partition. This can be done using a round-robin method, a hash of the message key, or a custom partitioning strategy.Usually you’ll take things like user id
as the message key
so all messages from the same user go to the same consumer (so a single user doesnt starve everyone lets say)
Multiple consumer groups
In this slide, we’ll talk about what are partitions in Kafka
Create a new topic with 3 partitions
Ensure it has 3 partitions
Update the topic in the node.js script to use payment-done
Consume messages in 3 terminals
produce messages
Notice the messages get consumed by all 3 consumers
When producing messages, you can assign a key that uniquely identifies the event.Kafka will hash this key and use the hash to determine the partition. This ensures that all messages with the same key (lets say for the same user) are sent to the same partition.💡Why would you want messages from the same user to go to the same partition? Lets say a single user has too many notifications, this way you can make sure they only choke a single partition and not all the partitions
Create a new producer-user.ts
file, pass in a key
when producing the message
Add produce:user
script
Start 3 consumers and one producer. Notice all messages reach the same consumer