初學 Kafka
Kafka Broker
Kafka broker (9092) is the first server that users interact with, it listens to some TCP connection. There are two pieces of abstractions — producer and consumer. 消費者與生產者的抽象概念
Topic: 資料邏輯分層 logical partitioning of data
Kafka broker(9092)
[Topic A]
Producer <---> [Topic B] <---> Consumer
假設現在 Kafka broker 裡面有個 Users Topic,而生產者發佈 John
進去 Users topic,當生產者持續發佈東西進 Users Topic,這個 Topic 就會越來越肥大
消費者在從 Users Topic 拉訊息 (pull information) 下來,
當資料表記有百萬筆資料,會採用sharding 將資料分成不同 database,而Kafka借用了這種技術,將Users Topic 分成了 Partition 1 (A - M) 與 Partition 2 (N-Z)。
因此 Producer 要再發佈 “Nader” 進 Users Topic 時,會推進 partition 2 (publish “Nader” to users topic on partition 2, current position 4)
You don’t use Kafka to do ad-hoc queries, this is not a relational database, and we work with indexes and partitions. You use it for fast writing and distributed of events that happens.
Queue vs Pub Sub
-
Queue: Message published once, consumed once
RabbitMQ started with queue. (oh… but people want to do pub sub…)
-
Pub Sub: Message published once, consumed many times
-
Kafka asked: How can we do both? Kafka build a system with those two in mind.
Answer: Consumer Group
Consumer Group
Consumer groups are invented to do essentially parallel processing and partitions. 平行處理與資料分片
Consumer group can remove the awareness from the consumer of a partition. Another benefit is it can kind of run and consume parallel data, like consume parallel information from multiple partitions.
You can have one consumer consuming two partitions or three or four, but one partition better be consumed with one consumer. 一個消費者可以同時從一至多個partitions 拉資料,但一個partition最好只有一個consumer跟他拿資料,而 Consumer Group 就可以保證這一點。
- If you want to act like a queue, put all your consumers in one group.
- If you want to act like a pub-sub system, put each consumer in a unique group.
Kafka Pros
-
Append only commit log. 紀錄檔都在最後面加入
It always goes in the end, fast, easy to find.
-
Performance is amazing. 效能極佳
Seeking to a position is extremely fast.
-
Distributed with zookeeper
-
Long polling You make a request and you wait for it, there’s no empty misses.
-
Event driven architecture, Pub-sub and queue
-
Scaling
scale like there’s no tomorrow
-
Parallel processing
One-topic, multiple partitions, you can read them in parallel.
Kafka Cons
-
Zookeeper
replace ZooKeeper with a self-managed metadata quorum
People complaining about zookeeper, especially at scale.
-
Producer explicit partition can lead to problems.
-
Complex to install, cofigure and manage