Ankita.eth
GithubContact
  • About Ankita
  • experience
    • TECHNOLOGIES
    • Frontend
      • Javascript
      • React
      • NextJS
      • HTML & CSS
      • UI Libraries & Frameworks
        • Tailwind CSS
        • Comprehensive Guide to UI Libraries and Frameworks
    • Backend
      • Node.js
      • Express.js
    • Database
      • Mongodb, Mongoose
      • PostgresSQl
      • MySQL
    • Packege Mangers
      • NPM-Node Packege Manager
      • Yarn
      • Yarn 2 (Berry)
      • PNPM
      • BUN
      • Commands cheatsheet
    • API Providers
      • Alchemy
      • Telegram Bot
      • CoinMarket
      • Thirdweb
      • Infura
      • Moralis
    • DevOps/Infrastructure
      • Docker
      • Kubernetes
      • CI/CD
      • Docker Swam
    • Protocols
      • ERCs & EIPs
        • ERC-20
        • ERC-721
        • ERC-1155
        • ERC-4337
        • ERC-6551
        • ERC-777
        • ERC-3643
        • EIP-7702
        • ERC-7715
        • ERC-7739
        • EIP-6780
        • EIP-5792
        • ERC-4626
        • EIP-1559
        • ERC-404
        • ERC-3643
        • ERC-223
    • Web3 Toolkits
      • Foundry
      • Hardhat
      • RemixIDE
    • Messaging/Caching
      • Kafka
      • Redis
      • Sendgrid
    • Blockchain
      • Solana
      • Ethereum
      • Polygon & Zero knowldge Proof
      • Bitcoin
      • Solidity
    • Deployment Platforms
      • AWS
      • Vercel
      • Heroku, Render
      • Domain setup
  • SDKs
    • Google Cloud SDK
    • AWS SDK
    • Firebase SDK
  • EOF EVM Object Format
  • Articles
    • Medium Articles
    • 🌐 My Work
  • 📞 Get in Touch
Powered by GitBook
On this page
  • What is kafka?
  • What is distributed
  • Start kafka locally
  • Kafka in a Node.js process
  • Breaking into prodcuer and consumer scripts
  • Consumer groups and partitions
  • Partitions in kafka
  • Current architecture
  • Three cases to discuss
  • Equal number of partitions and consumers
  • More partitions
  • More consumers
  • Partitioning strategy

Was this helpful?

  1. experience
  2. Messaging/Caching

Kafka

PreviousMessaging/CachingNextRedis

Last updated 8 months ago

Was this helpful?

What is kafka?

https://kafka.apache.org/

What is distributed

You can scale kafka horizontally by adding more nodes that run your kafka brokers

Event streaming

If you want to build a system where one process produces events that can be consumed by multiple consumers

Examples of apps

Payment notifications

Cluster and broker

A group of machines running kafka are known as a kafka clusterEach individual machine is called a broker

Producers

As the name suggests, producers are used to publish data to a topic

Consumers

As the name suggests, consumers consume from a topic

Topics

A topic is a logical channel to which producers send messages and from which consumers read messages.

Offsets

Consumers keep track of their position in the topic by maintaining offsets, which represent the position of the last consumed message. Kafka can manage offsets automatically or allow consumers to manage them manually.

Retention

Kafka topics have configurable retention policies, determining how long data is stored before being deleted. This allows for both real-time processing and historical data replay.

Start kafka locally

Ref - https://kafka.apache.org/quickstart

Using docker

docker run -p 9092:9092 apache/kafka:3.7.1

Get shell access to container

docker ps
docker exec -it container_id /bin/bash
cd /opt/kafka/bin

Create a topic

./kafka-topics.sh --create --topic quickstart-events --bootstrap-server localhost:9092

Publish to the topic

./kafka-console-producer.sh --topic quickstart-events --bootstrap-server localhost:9092

Consuming from the topic

./kafka-console-consumer.sh --topic quickstart-events --from-beginning --bootstrap-serve

Kafka in a Node.js process

Ref https://www.npmjs.com/package/kafkajs

  • Initialise project

npm init -y
npx tsc --init
  • Update package.json

"rootDir": "./src",
"outDir": "./dist"
  • Add src/index.ts

import { Kafka } from "kafkajs";

const kafka = new Kafka({
  clientId: "my-app",
  brokers: ["localhost:9092"]
})

const producer = kafka.producer();

const consumer = kafka.consumer({groupId: "my-app3"});


async function main() {
  await producer.connect();
  await producer.send({
    topic: "quickstart-events",
    messages: [{
      value: "hi there"
    }]
  })

  await consumer.connect();
  await consumer.subscribe({
    topic: "quickstart-events", fromBeginning: true
  })

  await consumer.run({
    eachMessage: async ({ topic, partition, message }) => {
      console.log({
        offset: message.offset,
        value: message?.value?.toString(),
      })
    },
  })
}


main();
  • Update package.json

"scripts": {
    "start": "tsc -b && node dist/index.js"
},
  • Start the process

npm run start

Breaking into prodcuer and consumer scripts

  • producer.ts

import { Kafka } from "kafkajs";

const kafka = new Kafka({
  clientId: "my-app",
  brokers: ["localhost:9092"]
})

const producer = kafka.producer();

async function main() {
  await producer.connect();
  await producer.send({
    topic: "quickstart-events",
    messages: [{
      value: "hi there"
    }]
  });
}


main();
  • consumer.ts

import { Kafka } from "kafkajs";

const kafka = new Kafka({
  clientId: "my-app",
  brokers: ["localhost:9092"]
})

const consumer = kafka.consumer({ groupId: "my-app3" });


async function main() {
  await consumer.connect();
  await consumer.subscribe({
    topic: "quickstart-events", fromBeginning: true
  })

  await consumer.run({
    eachMessage: async ({ topic, partition, message }) => {
      console.log({
        offset: message.offset,
        value: message?.value?.toString(),
      })
    },
  })
}


main();
  • Update package.json


  "scripts": {
    "start": "tsc -b && node dist/index.js",
    "produce": "tsc -b && node dist/producer.js",
    "consume": "tsc -b && node dist/consumer.js"    
  },
  • Try starting multiple consumers, and see if each gets back a message for the messages produced

Notice we specified a consumer group (my-app3)

Consumer groups and partitions

Consumer group

A consumer group is a group of consumers that coordinate to consume messages from a Kafka topic.

Purpose:

  • Load Balancing: Distribute the processing load among multiple consumers.

  • Fault Tolerance: If one consumer fails, Kafka automatically redistributes the partitions that the failed consumer was handling to the remaining consumers in the group.

  • Parallel Processing: Consumers in a group can process different partitions in parallel, improving throughput and scalability.

Partitions

Partitions are subdivisions of a Kafka topic. Each partition is an ordered, immutable sequence of messages that is appended to by producers. Partitions enable Kafka to scale horizontally and allow for parallel processing of messages.

How is a partition decided?

When a message is produced to a Kafka topic, it is assigned to a specific partition. This can be done using a round-robin method, a hash of the message key, or a custom partitioning strategy.Usually you’ll take things like user id as the message key so all messages from the same user go to the same consumer (so a single user doesnt starve everyone lets say)

Multiple consumer groups

Partitions in kafka

In this slide, we’ll talk about what are partitions in Kafka

  • Create a new topic with 3 partitions

./kafka-topics.sh --create --topic payment-done --partitions 3 --bootstrap-server localhost:9092
  • Ensure it has 3 partitions

./kafka-topics.sh --describe --topic payment-done --bootstrap-server localhost:9092
  • Update the topic in the node.js script to use payment-done

async function main() {
  await producer.connect();
  await producer.send({
    topic: "payment-done",
    messages: [{
      value: "hi there",
      key: "user1"
    }]
  });
}
///
await consumer.subscribe({
  topic: "payment-done", fromBeginning: true
})
  • Consume messages in 3 terminals

npm run consume
  • produce messages

npm run produce
  • Notice the messages get consumed by all 3 consumers

Current architecture

Three cases to discuss

Equal number of partitions and consumers

More partitions

More consumers

Partitioning strategy

When producing messages, you can assign a key that uniquely identifies the event.Kafka will hash this key and use the hash to determine the partition. This ensures that all messages with the same key (lets say for the same user) are sent to the same partition.💡Why would you want messages from the same user to go to the same partition? Lets say a single user has too many notifications, this way you can make sure they only choke a single partition and not all the partitions

  • Create a new producer-user.ts file, pass in a key when producing the message

import { Kafka } from "kafkajs";

const kafka = new Kafka({
  clientId: "my-app",
  brokers: ["localhost:9092"]
})

const producer = kafka.producer();

async function main() {
  await producer.connect();
  await producer.send({
    topic: "payment-done",
    messages: [{
      value: "hi there",
      key: "user1"
    }]
  });
}

main();
  • Add produce:user script

"produce:user": "tsc -b && node dist/producer-user.js",
  • Start 3 consumers and one producer. Notice all messages reach the same consumer

npm run produce:user
notion image
notion image
notion image
notion image
notion image
notion image
notion image
notion image
notion image
notion image
notion image
notion image