Kafka vs RabbitMQ (modified)
There are countless articles on the internet comparing among this tow leading frameworks, most of them just telling you the strength and weakness of each, but not providing a full wide perspective of features supports.
When I make a decision which technology to choose, I always wish to see a comparison table, when I see a full list of features, and I can quickly check what are the key features to my specific scenario, and then make a smart decision.
RabbitMQ and Kafka in a nutshell
Who are the players?
The flow starts from the Publisher, which send a message to exchange, Exchange is a middleware layer that knows to route the message to the queue, consumers can define which queue they are consuming from (by defining binding), RabbitMQ pushes the message to the consumer, and once consumed and acknowledgment has arrived, message is removed from the queue. Any piece in this system can be scaled out: producer, consumer, and also the RabbitMQ itself can be clustered, and highly available.
Who are the players?
- Consumer / Consumer groups
- Kafka source connect
- Kafka sink connect
- Topic/topic partition
- Kafka stream
Kafka is a robust system and has several members in the game. but once you understand well the flow, this becomes easy to manage and to work with.
Consumer send a message record to a topic, a topic is a category or feed name to which records are published, it can be partitioned, to get excellent performance, consumers subscribed to a topic and start to pull messages from it, when a topic is partitioned, then each partition get its own consumer instance, we called all instances of same consumer a consumer group.
In Kafka messages are always remaining in the topic, also if they were consumed (limit time is defined by retention policy)
Also, Kafka saves the topic data in the disk and uses sequential disk I/O, this approach boosts the performance of Kafka and makes it a leader option in queues implementation, and a safe option for big data use cases.
- Distribution and parallelism
Both give a good distribution solution, but with some difference. Let’s talk about consumers, in RabbitMQ, you can scale out the number of consumers, this means, for each queue instance you will have many consumers, this called competitive consumers because they compete to consume the message, in this form the message processing work is distributed by all the active consumers. In Kafka, the way to distribute consumer is by partitioning the topic, and now each consumer is dedicated to one partition. you can use the partition mechanism to send each partition different set of messages by business key, for example, by user id, location etc.
- High Availability
Both solutions are highly available, but Kafka took that a step further, by using Zookeeper to manage the state of the cluster, and this leader is also highly available and can be distributed, think on it, like they have another guard above guard, this makes Kafka be safer and powerful for big data use cases.
Kafka leverages the strength of sequential disk I/O and requires less hardware, this can lead to high throughput: several millions of messages in a second, with just a tiny number of nodes. RabbitMQ also can process a million messages in a second but requires 30+ nodes.
Kafka has a good replication by design, and if the master broker is down, automatically all the work is passed to another broker which is a full replica of the died one, no message lost. In RabbitMQ queues aren’t automatically replicable, this need to be configured.
- Multi subscriber
In Kafka message can be subscribed by multi consumers, means, many consumer types not many instances of same one. In RabbitMQ message can be consumed only once, and when consumed, the message disappears and isn’t accessible anymore.
- Message ordering
Because Kafka has partitions, you can get messages ordering. when sending messages by business key to the same partition messages get ordered by that message key. This can’t be achieved in RabbitMQ, only by trying by mimic this behavior by defining many queues and sending each message to a different queue, ton a large scale, this can be hard to get. compaction log: if same message key arrived multiple times, then Kafka saves only the last value in the log, and delete irrelevant messages.
- Message protocols
RabbitMQ supports any queue messages protocols like AMQP, STOMP (Text based), MQTT (binary) and HTTP, while Kafka supports just binary messages.
- Message lifetime
Because Kafka is a log, messages are always there, you can control this by defining a message retention policy. And because RabbitMQ is a queue, messages removed once consumed and acknowledgment arrived.
- Message acknowledgment
In both frameworks, producer get confirmation that message arrives in queue/topic and also the consumer sends an acknowledgment when message consumed successfully. so you can be sure that messages didn’t get lost in the way.
- Flexible routing to a topic/queue
In Kafka message is sent to topic by key, in RabbitMQ there are more options, for example by regular expression and wildcard, check the docs for more information.
- Message priority
A key feature in RabbitMQ, a message with high priority consumed first. hard to achieve in Kafka (can be done by message keys, but in large scale, this can be hard)
In Kafka you have 3rd party tools, some are licensed, in RabbitMQ you have built-in management UI.
- Transaction support
Both have a new feature in Kafka but supported.
Use Kafka if you need
Time travel/durable/commit log Many consumers for the same message High throughput Stream processing Replicability High availability Message order
Use RabbitMq if you need:
Complex routing Priority Queue JMS compliant message queue
Actually, RabbitMQ is enough for simple use cases, with low traffic of data, you have certain benefits like a priority queue and flexible routing options. But for massive data and high throughput use Kafka without debates. And if you need a commit log or various consumers for same messages, then go to Kafka because RabbitMQ can’t assist you with it.