The MOM with a message
Message-orientated Middleware explained
Message-orientated Middleware (MOM) contains two key concepts of data integration. Messaging and Middleware. Still, they are hard to grasp. This blog explains both through a bunch of analogies.
Introduction
I’ve seen people using middleware systems and building messaging solutions for years before the concepts of MOM fully sank in. After that, they often made completely other choices. It’s this understanding that differentiates an integration specialist.
It’s however not strange that it takes a while to gain such understanding, because most of our thinking of systems is based on applications. This thought-model of applications is based on the three-tier architecture. A presentation layer (UI), application layer (business logic) and data layer (database).
Like databases, middleware is on the data layer level, only it’s not stored (data in rest), but moving (data in motion). It moves data between systems. This means it is distributed by nature and we have to deal with certain challenges like:
Is the network secure and reliable?
Do we have means to transport data?
Can we bridge the gap between systems?
Message-orientated middleware is one of the most popular principles to do this.
The analogy
Though sending letters isn’t popular anymore, the ideas of messaging are still based on this analogy.
Billie Holiday lived in New York during the fifties. Say, she wrote a letter to a friend in London. After writing her last line, she puts the letter in an envelope and brings it to a mailbox. The mail company ensures that the letter is delivered to her friend's address in London.
During transport this letter would probably be sorted in distributed centers and transported by trucks, trains and plains. For a long time this has been the way to get information from A to B.
In messaging, we transport information in a similar fashion. A data message consist of data and metadata (like the address or a timestamp). Middleware systems act like the mail company and delivers the message to the right location. This is the basic concept of messaging.
The differences
An analogy holds only to a certain extent, until it falls apart. This is also true for sending letters compared to sending data messages. There are many differences, but let’s discuss the main three.
Messages in MOM are:
1. Digital
2. Multi-Pattern
3. Processed
What does this mean?
Digital
Though this is the most obvious difference, it also has the biggest implications.
A letter is limited because of the amount of information you can put on a piece of paper. It’s also limited, because of the time it takes to deliver. It can take a whole week before the friend in London gets the letter (when she is at home).
A digital message is much less limited. Think of an email that can be to sent from New York to London in a matter of seconds. And even when your friend is on vacation in Australia, she can still receive it.
It can also be easily copied so that other friends can receive the same mail. You don’t need to write a separate email for every friend, like a Christmas Card.
Patterns
Within logistics, there are many ways to deliver a mail or a parcel. This counts for messaging as well. In a sense the letter analogy is just one of many analogies.
Consider the following messaging patterns:
- Fire-And-Forget: This is like sending a mail. You send the mail and then you can forget about it. The mail company takes care of it. You may get an answer, but not necessary.
- Push messages: Here messages are sent to one or multiple receivers and every receiver gets notified on arrival. Think of WhatsApp.
- Pull messages: This is like a P.O. Box. The message waits until the receiver decides to pick it up.
- Request-Reply: A message is sent, and a reply is sent back to the sender. The reply can be anything like an acknowledgement or an answer. It’s as sending a letter, and you receive the answer back. This pattern is used in a lot in web applications.
- Publish-Subscribe: Here you publish a message and everyone that is subscribed gets a copy. Think of a newsletter or a subscription to a newspaper.
Processors
It’s not very common that a mail company opens the mail or parcel and does something with the content. In messaging, this is perfectly normal.
It’s like sending a mail in English to Japan then the message is opened, translated to Japanese before it is delivered. Many middleware have capabilities to process data messages by routing, filtering, storing, transforming the data of the message until it’s delivered.
The place where the data is processed is often called a processor or a service.
The message
A data message is often presented analogous to a letter. There are two levels. One with the content and other contains additional (meta)data. Here are various ways to think about it:
Conceptually these analogies are a good way to think about messaging, but technically it’s a bit different.
What exactly is a message? How would a programmer think of a message? A message for a programmer is just a map with key/value pairs. Every key/value pair can have its own datatype (A string, a number, an object and so on).
The key/value maps travels as a message through systems. In different middleware and various steps, we can manipulate one or more of those keys.
That we define or categorize certain keys is partly convention and partly based on the implementation of the middleware or protocol. Consider a message in Apache Camel:
In Camel a message body contains the main content, headers contain the metadata and attachments mostly the binary data. There is however nothing that withhold you to put the main content in a header and leave the body empty.
Headers, body, properties, attachements… all are just keys that reference data of a specific data type. You can do with it whatever you want. A message is thus nothing else than a container of data.
The middleware
When a parcel travels from a factory in China to your home, it passes through all kinds of transport systems. A train can pick up the parcels at the factory. At the harbor, the parcels are distributed into various shipping containers. A container ship travels over oceans to the next harbor. At this harbor the parcels from the container are brought to a distribution center and finally your local mail company delivers the parcel at your home.
A parcel, a container, a harbor, a truck, a distribution center, all of them have their role in the logistic chain. In IT this chain is mostly called an integration and the various parts are called middleware.
Every middleware system has its own function or role within an integration. Sometimes people think that a single middleware system contains all functionality to integrate data. For example that Apache Kafka is enough for all integration problems of a company. In practice, you often need multiple middleware systems to create a complete integration.
For example: An adaptor or connector gets the data out of a system and puts it on a broker. The broker acts as a transport layer, but also a temporary storage (think of ActiveMQ, RabbitMQ or Kafka), an ESB (for example Mule, NiFi or Camel) processes the message further (filters, transformation) and sends the result to an REST API of the receiving system.
It mostly depends on the type of pattern and type of processors, what kind of middleware is needed. The message is every time the container that travels from one middleware system to another. Hence, Message-Orientated Middleware.
Best practices
A shipping container has a specific size, this standardization makes the transport of goods a lot easier. Though in data integration they are not that fixed, there are certain best practices in messaging that a lot of integration specialists follow:
- The message is central
Though it’s clear that a message is the central concept of message-orientated middleware this is often forgotten in practice (especially when you don’t deal with data integration on a daily basis). So it’s about an order message and not about the ERP-system, an API, a database and so on.
- The unit of data
A supermarket sells products. The type of products change over time. It’s technically possible to put all articles in one message and send it to all systems. This however causes a peak load on the middleware systems. Even with cloud scalability this is often not what you want.
It’s better to define a relative fixed size unit. A message contains for example only one article at the time with its attributes (articlenumber, suppliername, type of package). When this article changes, only one message needs to be sent or retrieved.
- A message is independent
A message is an independent piece of data. When you send a letter to London, it’s not dependent on the previous or later letters you sent (or any other letter). It also doesn’t need to know anything about other letters when it’s on transport. This counts also for a data message.
A message is stateless, it travels in memory. It’s surely not impossible to take message ordering, duplicates and relation between messages into account. However, this complicates integrations a lot and comes with a performance penalty.
Often it’s best to leave these things at the application layer. When the data (in the data layer) is stored within a database, it can be combined and ordered much more easily.
- Choose the correct storage
When you really need data of one message, that is dependent on data of another message, then you should choose your dependency wisely. When this dependency (a system) is slow or unavailable, it will affect the whole stream of messages.
Basically, you can use any high-available storage system for this. A relational database like MySQL or SQLServer is probably chosen often, but its not nescesarely the best choice.
When your messages use a specified data format like JSON or XML you can better use a Document database like MongoDB or BaseX. Probably the best database to use is a key/value database like Redis as it mimics the model of a message the best.
. Guaranteed Delivery
One of the worst things that can happen in logistics, is when a mail or parcel gets lost. The same counts in data integration. When the data is lost everyone is unhappy. That’s why it’s better to use protocols and middleware that that guarantee delivery. So don’t use file-based protocols (like SFTP or AWS S3), but use HTTPS, AMQP or JMS for example.
. Structured data formats
When data needs to be processed it’s better to use a data format that is easily processable (by systems). Think of JSON or XML. Record-based formats like CSV or TSV fall somewhere in the middle, but unstructured data like emails or other documents are the worst.
Final note
There are many more best practices and concepts used in data integration, but this discussion hopefully gives you a good idea of the problems and solutions of Message-Orientated Middleware.