15. Queuing vs. Streaming
Message brokers compared
Meanwhile, the revenue of The Bakery Group has grown quite a bit, as has their application landscape. The latest addition is a data warehouse. The idea is to gain more insight into sales and which cakes from the assortment are doing well. A simple MFT (Managed File Transfer) or iPaaS is not enough for the amount of data that needs to be processed. A broker is needed, but which one?
Aleksandra wants to be able to track live sales and production on one hand, and on the other hand get weekly reports with the trends and KPIs (Key Performance Indicators). What technology could we use for both the live and weekly numbers? That’s very easy, let’s just use Kafka, says Thomas. “Huh, what, the writer Franz Kafka?”, Aleksandra wonders. No, Apache Kafka.
In recent years, Kafka gained tremendous popularity. The question is: why? Surely brokers are nothing new and have been in use at many organizations for years? In short, Kafka added streaming to brokers, whereas before it was all about queues. But this only raises more questions. What is the difference between the two, how did this situation arise and what is the advantage of one over the other? A deep-dive into the world of brokers.
The usefulness of a broker
Brokers, as the word implies, is a component between different software. Its role is like that of a broker or an equity broker: an intermediate point between two parties. Because a broker is in the middle, it is also considered a form of middleware. And because messages are central to brokers, better called Messaged-Orientated Middleware.
A broker is not really necessary. You can also send a message directly to an application or just use a MFT, API or iPaaS platform. Just as you can also sell your house directly to someone without using a realtor. So such brokers must offer some added value, because of course they also cost something.
As an example of the added value of a message broker, let’s take a situation when an application offers an API without a broker. In this case, the API is hosted. Other applications calling the API are the clients.
But suppose the client also wants to start offering data, it will have to host its own API as well:
Suppose there are four applications that want to exchange data with each other:
Using an API Gateway, you could decouple this, but even then the API needs to be hosted each time.
To decouple applications from each other, traditionally an SFTP server was placed between them. This is because then anyone could act as a client. While this offers the advantage that no one has to host anything anymore, the disadvantage is that you can only post files (Fire And Forget), while you are no longer message-based and have no Request-Reply capabilities. In addition, FTP Servers are file-based and therefore less scalable and more susceptible to file locks etc.
Now what if you could get the benefits of both? So message-based (like APIs) and client-based (like FTP). This is where a message broker comes in.
With a message broker, you can decouple applications, clients no longer have to host anything themselve, you can exchange message-based messages, and there is support for both fire-and-forget and request-and-reply patterns. But wait, there’s more…
10 benefits of a message broker
Brokers have a number of unique features that distinguish them from other middleware. Some of the advantages are:
- Decoupling
Applications do not need to know each other directly.
2. Security
The broker handles the security and encryption of messages.
3. Hosting
Only the broker needs to be hosted; all other systems act only as clients.
4. Message-based
Because you work message-based, you have no file system limitations (file locks), messages can stay in memory (performance) and you have message headers (meta-information).
5. Transport
Applications only need to know one protocol, one address to send and retrieve data. The broker is the central point for data exchange.
6. Patterns
Message brokers often support multiple patterns, such as fire-and-forget/point-to-point, request-and-reply and pub-sub.
7. Routing and Filtering
Messages can usually be routed or filtered by certain headers, but sometimes by message content. For example, a buyer is only interested in a message with certain content.
8. Buffer
The broker can act as a buffer. Suppose a lot of messages are offered and buyers can’t process them that quickly, then they remain in the buffer at the broker.
9. Scalability
A broker can often contain multiple instances, whether physically on two servers in the same data center, or spread across thousands of servers in data centers worldwide.
10. Delivery guarantees
Brokers usually offer a delivery guarantee through a QoS (Quality of Service). There are often several modes in this, with “Exactly once” being the most stringent.
Okay, so brokers are quite useful. Yet the success story is one of slow progress. To figure out the why of this, we have to go back in time quite a bit.
A piece of history
In 1998, a messaging library called JMS, or Java Message Service, was introduced for Java. This library allowed you to create either a broker or a client in Java. JMS tried to achieve the benefits of a broker, which at the time popular protocols such as FTP, TCP sockets, RMI and CORBA did not offer.
Now it should be emphasized that JMS itself is not a protocol, but a library (a Java API). This fact was both the strength of JMS and its weakness. Its strength was that the library could be implemented quickly in Java programs. Many brokers are based on version 1.1 of the JMS API. Think of SonicMQ, ActiveMQ and IBMMQ.
Each of these brokers was itself very capable, but they brought with them two problems. Most software vendors found the JMS API too limited and soon expanded it with an API of their own. The brokers all got their own client implementation. In addition, it is a Java implementation and therefore not programming language-independent. Other languages must use a bridge to connect to a JMS Broker.
A year after the introduction of JMS 1.1, in 2003, work began on a true broker protocol similar to the concepts of JMS. This became known as AMQP (Advanced Message Queuing Protocol). However, it took until 2011 before version AMQP 1.0 was released.
JMS was so strong in the market at the time, however, that it was years before AMQP gained any popularity. Meanwhile, ActiveMQ, for example, has support for AMQP. Other well-known AMQP brokers are RabbitMQ and Azure Service Bus.
Although practice is still sometimes a bit disappointing, AMQP client libraries in different languages allow you to connect to any AMQP broker. An example project that provides libraries in multiple languages is the Apache Foundation’s Qpid project.
Even more brokers
So there are JMS and AMQP brokers. And sometimes brokers also support both (such as ActiveMQ). However, there is a third type of broker that is popular, which is MQTT. This protocol, Message Queuing Telemetry Transport, was created around the same time as JMS. Meanwhile, version 5 was introduced in 2019 (JMS never got beyond version 2.0, partly because of shifting toward AMQP and REST APIs).
MQTT can be seen as a light type of broker. And despite its name, it is not a queuing broker. In fact, it does not support queues but topics. We will come back to the difference between queues and topics later.
First, we discuss yet another broker. The most popular broker of recent years: Apache Kafka. Like MQTT, it is all about topics. Unlike MQTT, it is not a lightweight broker, but rather one that can handle big data.
Kafka originated at LinkedIn. The company found the scalability of JMS and AMQP brokers insufficient. In addition, it wanted to emphasize the pub-sub pattern with its specific use cases. In 2011, the Kafka broker was created, which later, like so many other brokers, came under the Apache foundation.
A presentation of Kafka’s history:
Probably the best known and most widely used brokers currently are RabbitMQ, ActiveMQ and Kafka. All three are open source, with the former falling under the Spring umbrella, while the latter two belong to the Apache Foundation.
And more brokers
However, the Apache Foundation has even more brokers on offer. The Qpid project not only makes AMQP clients, but has also developed its own AMQP broker. However, you won’t encounter this one much in practice.
There are three other brokers at Apache that have emerged as new stars on the horizon. These are Apache Pulsar, Apache RocketMQ and Apache TubeMQ. All three were developed with similar goals, namely a broker that offers distributed, cloud-native and high performance.
Pulsar
Pulsar is a streaming broker that originated at Yahoo and is a direct competitor of Kafka. The idea behind Pulsar is to break up the broker into several modules. That means multiple instances of the broker as well as multiple instances of the storage. The different modules can all run and scale independently of each other.
Interestingly enough, Kafka actually replaced external modules (such as Zookeeper) with a single product, thus reducing operational overhead.
There have been entire epistles written and debates about performance: Kafka vs Pulsar. Sometimes Kafka comes out of the test faster, sometimes Pulsar. As a user, you can safely assume that the throughput of both is extremely high and you will rarely run into limits.
RocketMQ
The second project is RocketMQ, a broker developed at AliBaba. AliBaba, like LinkedIn and Yahoo, also found the scalability of existing JMS and AMQP brokers insufficient. In contrast, the company did stick with JMS. Much of its infrastructure was already based on JMS and the use of queues. Eventually, AliBaba’s developers developed their own implementation. One unique functionality is “message ordering,” which is not common with JMS Brokers.
TubeMQ
The very latest broker is TubeMQ, originated at Tencent. The idea is similar to that of Pulsar, as shown in the following architectural picture:
At TubeMQ, it is notable that they give a central place to the API and Web portal for broker configuration and management. In many cases, brokers leave that to third parties.
Time for a review:
+-----------+----------+--------+--------+---------------------------------------+
| | Protocol | Queues | Topics | Examples |
+-----------+----------+--------+--------+---------------------------------------+
| JMS | No | Yes | Yes | ActiveMQ, IBM MQ, RocketMQ |
+-----------+----------+--------+--------+---------------------------------------+
| AMQP | Yes | Yes | Yes | ActiveMQ, RabbitMQ, Azure Service Bus |
+-----------+----------+--------+--------+---------------------------------------+
| MQTT | Yes | No | Yes | Eclipse Mosquitto, HiveMQ, Moquete |
+-----------+----------+--------+--------+---------------------------------------+
| Streaming | No | No | Yes | Kafka, Pulsar |
+-----------+----------+--------+--------+---------------------------------------+
Queues vs Topics
A queue, is a staging area that contains a number of messages.
A producer, the one who sends the message, places the message on the queue. In this case, there is one message on the queue. This message is usually kept in memory. If the producer places another message there is one more on it, and so on. The queue gets bigger and bigger.
Once a consumer is connected to the queue, it receives the messages. This continues until the queue is empty again. As soon as another message is placed on it, it consumes it again immediately.
With queues, there are generally the following two best practices:
- There is one producer, one consumer
- The oldest message is consumed first (First-In First-Out)
Despite the fact that both are best practices, many brokers also allow it to deviate from them. For example, by supporting Last-In-First-Out.
Usually, multiple producers and consumers are also allowed. This often leads to additional complexity, because who puts a message on the queue and which consumer received it? Multiple consumers is often not what you want. If you have two consumers who are equally fast, then each consumer receives exactly the half.
It is common for a single consumer with multiple listeners (multiple client instances) to consume the messages on the queue. In practice, if the consumer is fast enough, you won’t actually see any messages on the queue at all; it will remain empty.
Some use cases of using queues are:
- Real-time messaging between applications. In this case, the expectation is that producer and consumer always have a connection to the queue. This is a commonly used pattern in EAI (Enterprise Architecture Integration).
- Request-Reply, where, for example, Web applications have multiple backends with a lot of load. Commonly used in SOA (Service-Orientated Architecture).
For use cases where multiple applications often receive the same message and where consumers are not always connected to the broker, queues rather than topics are usually being deployed.
In topics, there is the pub-sub pattern, in other words publishers and subscribers. A publisher is the one who sends the message (producer in queues) and subscriber is the one who receives it (consumer in queues).
The reason for the somewhat modified naming is that the relationship between source and target is even less direct. Topics can be compared to a newspaper. The publisher makes one newspaper and everyone who has a subscription gets a copy. The editor and publisher do not need to know who each reader is.
Queuing vs Streaming
In general, you could say that the JMS and AMQP brokers are the real queueing brokers. The support for topics by both is often an extension of EAI and SOA architectures that they support. In addition, you can see that these brokers are traditionally implemented in active-passive role (Fault-Tolerant Broker):
The active broker handles messaging, the passive broker takes over this role when the active broker is down or unreachable. Nowadays you can see that queueing brokers (e.g. ActiveMQ Artemis), following the example of streaming brokers, also support Active-Active setups. We call such a broker load-balanced.
Queueing broker architecture
JMS and AMQP usually have an architecture specifically suited to a Queueing Broker. An example is that of RabbitMQ:
These queueing brokers are a good fit:
- real-time applications (point-to-point messaging)
- application/services data exchange with simple clients
- (a)synchronous messaging. Web applications calling multiple API backends
- messaging that makes extensive use of routing rules
- control over messages (e.g. message expiry or message delay)
- task queueing
Streaming broker architecture
The architecture of a streaming broker, also called a streaming platform, such as Kafka, is completely different. This is because it uses what is called a distributed commit log. This log is actually best compared to writing a record into a database. With a streaming broker, you can always add a log message. Each log is a record with its own unique ID. Want to change an existing record? You can’t, once written is written. Log records are immutable.
A streaming broker ensures that those logs are distributed across different nodes to ensure scalability.
The different architecture allows streaming brokers to handle very different use cases (in addition to traditional messaging use cases via topics), such as:
- Big data offloading to data warehouse and data warehouses
- Collecting logs, metrics and statistics
- Website Activity Tracking
- Event streaming
Event streaming
Event streaming is an important concept within streaming brokers. These are applications that use the Stream API of brokers, such as Kafka and Pulsar.
Typically, these applications are created based on open source stream processing frameworks, such as Apache Spark, Apache Storm, Apache Flink, and Apache Samza or proprietary services such as Google’s DataFlow and AWS Lambda.
The applications built with it tap into a data stream from a streaming broker (in the form of a topic). This data stream has no defined end and is potentially infinite. On this stream, data is processed on-the-fly and in parallel. For example, to put data in real time on a dashboard. Think of sales, production lines, network monitoring or stock prices etc.
Summary:
In summary, queueing and streaming brokers both fall under Message-Orientated Middleware. While there is some overlap, the architecture and use cases are generally very different.