9. Integrating is not a technology, but a process

The role of processes & procedures

18 min readApr 23, 2024

It is March. A faint sun shines and the wind still blows bleakly across the deserted industrial estate. But the time has come, Robèrt van Beckhoven, Holland’s master baker, takes a big pair of scissors in his hand and cuts the ribbon. The new factory of “The Bakery Group” has been opened.

Yes, not all the production lines are finished yet, but the first people will soon be eating Aleksandra’s famous apple pie. Robèrt praises the quality of the recipes and hopes that all of Holland does not bake, but gets to know Aleksandra’s pies!

The production line

A bakery has surprisingly few people on the production line. Large bales of ingredients like flour go into large machines after which a long oven bakes the loaves or cakes. At the end, it smells like fresh bread and is quickly wrapped automatically.

The more variety and customization, the more people are involved. In the production line of data integrations, there is often a lot of customization, which in turn involves a surprising number of people working on integrations. Yet with good standards and guidelines, new and modified integrations can be implemented quickly.

Traditionally, there are a number of roles on the production line of an integration. The most well-known are:

Analyst
Developer
Tester
Administrator

There are of course many other roles involved, such as architects, product owners, scrum masters and managers. However, these are not so much on the “production line” itself.

Data Analyst

The analyst is the bridge function between the business and the technology. An analyst often first talks about the desired functionality, the business processes and applications involved. The next step is to divide the business process into one or more integrations. For each integration, there is a design, which usually breaks down the applications and integration services into a diagram with swim lanes. Sometimes an official notation is used, such as BPMN, but this is not strictly necessary.

At the functional level, it often describes “what” is needed. This is still very high-level. To build it, the “how” must be described exactly. Functionally, for example, I want a pie with apples, but to actually make the recipe (the design) usable, I need to know exactly which apples are needed, how many grams are needed, and what size they should be cut into.

With software you need to be even more precise. Suppose the business specifies that only order orders of type 6 should be routed to application B. What exactly is the field to be routed to? Is this field called “type” or “OrderType”? And should the field be uppercase or lowercase? One discrepancy causes the integration does not work.

In addition to routers and filters, the design contains a lot of data about the connections to the applications and the mapping of formats between applications. A mapping is all about field X being called field Y at the receiving system. Good examples and diagrams of data messages are also important for mapping.

At the end the design is reviewed with architects and developers. Is this really what is desired? Does it meet architectural guidelines? Is it technically feasible? Some organizations use a “Definition of Ready,” where the design is considered complete to be built.

Developers

Integration developers are usually not diehard programmers, but rather a jack of all trades. They must understand design principles and adapt to the standards of the organization and technologies of the applications.

A programmer, on the other hand, is often in a single domain, such as .Net or Java. In data integration, technologies such as databases, APIs and languages sometimes differ from one integration to another. For example, a .Net Web service may be built very differently from one in Java. Integration frameworks, tools and code are used to build integrations. Sometimes this can be done within a single integration function, such as an API Management. Often multiple tools are needed at the same time.

When building, developers follow architectural guidelines, usually supplemented by the following standards:

* Code and configuration standards
* Naming conventions
* Version control procedures
* Review procedure
* Test procedure
* Version numbering

To that extent, these standards are not very different from what programmers are used to. And so often you can see that code and naming standards align with programming standards. For example, standards from the Java world for naming a Class.

While this approach is common, it is not always the most logical. After all: not objects or functions are at the heart of data integration, but data itself. Often the data entity such as “article” or “order” is taken as the basis. By the way, plural is more common in integrations, unlike, say, database entities. You will speak of “articles” or “orders” there.

The development process usually runs locally, with the developer building (programming or configuring) and testing the integration step by step. Finally, when the steps add up to one service, another unit test is often made. So this process using version control like SVN or GIT is not very different from application development, but the integration developer will more often rely on tooling at a slightly higher level.

It is best practice when developing integrations to rely as much as possible on graphical modeling or service configuration. This is to get a better understanding of the flow of data. Secondarily, the more complex and custom integration, logic is often programmed.

Tester

An integration tester doesn’t just look to see if it was built according to design, but more importantly looks at the various paths that data messages can take. For example, did the message pass through the filter or not? Many scenarios are often possible. Each route is defined and tested as a scenario.

Leading the testing is whether the data output matches the expected output. This may involve looking at specific integration services, but generally testers look at the business process. So the analyst goes from the business process to the technical integration, the developer builds it, while the tester returns to the business process.

Administrator

The domain of the administrator is the production environment. The role of integration administrator is close to that of application administrator. First of all, the administrator is the gatekeeper of the production environment. Is the integration built, tested and documented correctly?

Next, the administrator is responsible for integrations installations and incident corrections, including error handling. Finally, it is about the production environment itself. Are the integration platform and integrations on the platform running properly? Everything revolves around the continuity of the organization.

On the management front, much has happened in recent years. The main developments are:

CI/CD: automated, continuous and predictable putting new software into production
Log management: automatic reading of logs into a search engine (Elastic, Solr, Splunk etc)
Monitoring: monitoring servers, tools and integrations

The core of a modern administrator is to be data-driven, so that it is always clear what is going on and the data can be used as steering information for improvements. In addition, he takes a preventive approach in order to avoid incidents.

Governance

The customer and contractor obviously want to ensure that the plant and production at the plant meet expectations. This requires a piece of governance. In integration, governance involves the following:

1. Principles

Architecture principles are guiding statements that indicate what is desirable. An architectural principle can be thought of as a policy statement that relates specifically to the design of organization, processes and information provision.

In integration, for example, it is often a principle to use open standards (think HTTP or OpenAPI). Another well-known principle is that of “decoupling.” Applications are not adapted to each other (tightly-coupled), but use messages and middleware to exchange data (loosely-coupled).

2. Control mechanisms

Different moments in the production process ensure being in control. That is, architecture principles and organizational guidelines are implemented. You often see that at handover situations these mechanisms come into play. For example, when the design is handed over for development, during integration testing (quality assurance) and at the time of going into production. Consider checks by management and change management.

3. Measure

The integration layer used to be seen purely as a black box. In a sense, this is still true from the perspective of the application layer. Organization-wide, however, there is much more emphasis on steering based on data. Here the time-honored “measuring is knowing” applies. Through monitoring, performance and logging, we know where something is happening and what is happening.

4. Communication

By definition, integration always involves multiple parties. At each stage (design, build, test and production), the various parties must be informed. Governance must ensure that there is not too little, but certainly not too much, communication.

5. Responsibilities

The chain of responsibilities in integration has two aspects. On the one hand, the distribution of responsibilities during the generation of new integrations, including handover moments. On the other hand, responsibilities of data during exchange between systems. The latter has to do with what to do in case of incidents and how to take privacy legislation into account.

Guidelines

Guidelines are used to guide implementations. It ensures that you propagate a fixed quality and are also aware if you deviate from it. In residential construction, for example, the guideline is to lay the building blocks in a consistent manner; all walls are then not only solid, they also look the same.

Guidelines in integrations often focus on questions such as:

Which patterns belong on the integration layer and which do not? For example, routing is part of the integration layer, but calculations are not.
What is the naming of integrations?
What are the guidelines for security?
What are the guidelines for protocols?
What are the guidelines for data formats?

For example, a guideline in integration might be “Cloud-first” and “API-Driven.” That is, when acquiring a new application, prefer a SaaS application that communicates via REST over SFTP. Good guidelines ensure that questions can be answered using a decision tree.

In practice, with integrations, you often see a pressure for guideline compliance because integrations are expected to be the “glue” in the organization. Active adherence to guidelines really delivers many benefits in the scalability and maintainability of the environment.

Standards

A standard is a set of rules that describe how people should develop and manage materials, products, services, technologies, tasks, processes and systems. In some cases, they are mandatory. For example, if the exchange must run according to a specific EDI standard or authentication via the OAuth or SAML standards. If it does not run according to these standards then it may not be implemented.

In other cases, a particular standard may only be a guideline. For example, that JSON or XML is preferred over ASCII and CSV. Again, outside of security, the integration layer is often expected to be relatively flexible in dealing with different standards.

Phases

Building an integration environment goes roughly in three stages:

1. Implementation Phase
2. Expansion Phase
3. Phase out

Implementation Phase

The implementation phase is characterized by introduction, replacement or renewal of the integration platform. Integration platforms usually support multiple architecture styles. The first step often consists of determining the architecture style. You often see later that multiple styles are mixed together, but that certainly should not be the starting point. Of course, this is a requirement that is addressed when purchasing the integration platform.

The second step is setting up the platform. With cloud (iPaaS), this step has already been done, but with on premise, the system will need to be installed and configured.

In this step you are not yet working on the actual integration, but on the fundamentals to run data interfaces. You can compare it to a subway tunnel: first, the tunnel itself has to be built. The tunnel has to be drilled and the walls have to be propped up. In the next phase, you start laying the actual rails on which “the subway” will run.

This phase is all about the data you run over the platform. Once you have identified the data messages that are important in the implementation, you are going to create a design for each data message. Each data message is a separate metro line so to say. You could think of each station as a separate step. This could be a technical step or a functional step. When all the steps and what happens in the steps are complete, the data interface design is complete.

It depends a bit on how you work, but you can define the design of all data interfaces in advance. Usually a more agile approach is taken these days, where a simple interface is set up first, the design is modified, and a link is added to the integration layer over and over again.

You now enter the build phase. How to build depends heavily on the integration platform. Some platforms provide graphical tools, others assume programming.

Once the platform is built, the test phase will first see if messages get from A to B. We call this unit or technical testing. Then end-to-end chain testing will be performed, looking at whether the data can be used as information by the business. In some cases, there is still acceptance phase by the end users. The final phase is that of in-production adoption.

Expansion Phase

In the expansion phase, the interfaces on the platform will be expanded. Like an additional branch to another district or a new subway station, there will be a branch to a newly purchased system. This could be new applications or data messages. As long as the application landscape evaluates, the integration platform evaluates with it.

Each extension will go through the same steps. So even when modifying an existing interface, an integration is designed, built, tested, accepted and put into production. And just like in metro system, modifications in a production environment create more caution. Often a metro system is still down at night, but integration systems typically process data 24/7.

Phase Out

The final phase begins when the integration platform is no longer in use. It may be the deployment of new integration components, a new architectural style or a different platform. You often see switching to the new platform on a per data message basis.

It is not unusual for this phase to be neglected. Full attention is given to the new platform and new demands and requirements from the business come in. This will require constant attention, especially from development and management. Integrations that are still in use cost time and money for maintenance. The resources of servers that continue to run also cost money. Another important argument is a greater risk of security breaches due to outdated software.

Processes

Key processes of integration:

incident process
problem process
change process
release process
test process
continuous improvement process
application lifecycle management process

Incident/Problem/Change

The incident process, the problem process and the change process are no different for integration than for other IT departments. The incident process repairs disruptions, the problem process resolves structural errors and the change process guides changes. Process frameworks such as ITIL describe these processes in detail. Here we only briefly discuss integration-specific incidents.

Incidents

Specific to integration is that it includes a lot of third-party components. An integration is prone to network failures, corrupt or incomplete data, or disruptions at receiving systems. An example is that there is article data from Nestlé and the software or receiving system cannot handle an emphasis on the é. In this case, it is often insufficient just to log the error, the message must also be placed on an error destination so that it can be included in the error analysis and offered again if necessary. Especially more modern systems, such as Kafka, explicitly take into account network disruption or a situation in which consumers are temporarily off the air.

Pager duty

Integrations are the lifeblood of a business. In addition, most integrations operate 24/7. A customer may place an order at night or a batch is running at 2 a.m. in the morning. Both make setting up a pager service unavoidable. This can take into account:

Pager service times: At what time does daily management run and what falls outside of this? Is it needed only during the week or also on weekends? Examples are 10/5, 12/6 and 24/7. Typically, pager service runs from Monday evening through Monday morning (where any handover can take place on Monday).
Pager service allowance: Working pager shifts (especially at 24/7) has a major impact on employees. After all, they have to be mindful of being able to log into systems quickly. At 24/7, you then have to have your laptop with you at all times, even if you go to a birthday or party. You may also need to fix something in the middle of the night. This involves questions like: what is the hourly compensation, how to deal with Sundays and holidays, how to deal with extra hours made and how to handle these hours administratively.
Pager service handling: what are the expectations when an incident occurs? What is the response time? Generally, the expectation is to respond within half an hour. The administrator on duty ensures that normalcy is restored and any affected parties are notified. Administration (hours, tickets) and Root-Cause analysis takes place later.
Run pager shifts: Who will run the pager shifts. Usually these are technicians (DevOps) with access to production systems. In addition, a weekly schedule should be created, taking vacations into account.
Third parties: Integration must also take into account SLAs with third parties. You can set up a pager service, but if SaaS applications from SAP, for example, are down, it must make sense to contact them at certain hours and ask them to fix it.
Alerting: Finally, those on pager duty should not actively monitor, they should be triggered by a monitoring system. Examples include sending certain alerts by SMS, Telegram or Mail notifier (for example, the mobile app eNotify can monitor mailbox and trigger a beep on the smartphone based on the sender and/or subject).

The start before setting up the pager service for integration is usually to go through the list of integrations with the business. When are these running and what is the impact if the not running and errors occur? Does it need to be fixed immediately or can it be fixed the next morning? Ultimately, the business value should clearly exceed the cost of the pager service fee.

Example of pager service at The Bakery Group:

Throughout the year, people have birthdays. Therefore, the bakery starts running early in the morning (even on Sundays and holidays). If the customers order in the evening then the cake has to be made in the morning. If the integrations between the web, order and warehouse systems are not running it costs thousands of dollars. Reasons for The Bakery Group to run 24/7 shifts.

Release process

In the beginning, it was still easy. Aleksandra simply baked a cake and gave it to a lucky person. Piece of cake. With large numbers, you need to align suppliers, employees and customers. They need to have a good understanding of on which stage what is happening. Just getting started is a recipe for failure.

Companies often have a large application landscape, working in projects and scrums everywhere. Interfaces (e.g., APIs) are constantly being updated. A good version control and release process is therefore crucial to avoid that everything goes completely wrong.

The release process ensures that the code goes through all the different phases and that at each phase it is exactly clear who created and deployed which code.

DTAP Cycle

Modifying and testing interfaces directly in production is something that happens in every company from time to time. The larger and more complex the environment, the more the need to avoid this and proceed in phases and steps. Usually a separate development environment is started. Once chain testing takes place, a separate test environment and, for user testing, an acceptance environment are often set up. The latter is usually similar to production.

The common acronym here is DTAP (Development, Test, Acceptance, Production). How many environments are needed depends on company size. A small company may have enough with an DP (development/production) or an DTP (development/test/production) street. Larger companies sometimes add a pre-production, a fallback or a training environment.

A well-designed DTAP cycle is half the battle. It prevents issues in production, where customers, employees or management are directly affected. It prevents many annoyances and costs. Yet a well-designed DTAP cycle is tricky from an integration perspective. You often see differences in connected applications. For example, one application has an acceptance environment and another does not. In these cases, chain tests are difficult to set up.

Releases

The code must eventually be compiled so that it can be deployed on one of the DTAP environments. Application development then involves converting source code to binary code. Often this is not necessary for integration tools or platforms because they already operate at a higher level. A compilation of text files is then often enough.

Version control

Version control keeps track of all changes to all code (text files). Two flavors can be found here, centralized or decentralized. With centralized, code is stored centrally, code is uploaded/downloaded to and from the server. The best-known example is Subversion (SVN). With decentralized versioning, everyone has a full copy of the code and thus no central server is needed. Here, the best-known example is GIT.

In general, version control in integration does not differ that much from regular application development. However, you can see that in the case of application development, an application is often housed in a few repositories, whereas with interfaces, there are often multiple projects (hundreds repositories).

Tailoring

An issue particularly for connectivity is that some parameters differ from one environment to another. For example, in the test phase, messages will be sent to a different URL than in the production phase. Obviously, this should not be reversed, i.e. use a production URL for testing. To choose the right parameters for the right environment, tailoring is used. The source code is tailor-made for a specific environment in the DTAP cycle.

Deployment

Deployment includes the actual installation of the release in a specific environment in the DTAP cycle. The build of the installation file and the deployment are usually based on a (build) script.

Build programming languages

The tasks in the release process, such as versioning, releases, tailoring and deploying, belong together. Today, there are several Domain-Specific Programming Languages(DSL) that focus on these tasks. The most common are Ant, NuGet, Maven and Gradle.

Release communication

Part of the release process is announcing, reviewing and publishing releases. Announcing indicates when the release will occur and which systems will be affected. Testing is usually done by specialists on, for example, a Change Management/Advisory Board to determine impact and alignment with other changes. Publishing usually indicates that the change is active and that there is a link to the documentation.

Modern production lines

Setting up a good “Code factory,” or in our case a production line that produces integrations, involves a lot. It is usually not the software components, but the processes above that make it so time intensive to set up a good integration platform. It is part of the continuous improvement process to continually optimize the production line. This can be done by tightening procedures or improving integrations. There are several ways to modernize production lines. The main ones are:

Cloud

Cloud specifically focuses on taking away the installation and running of software. Traditionally, the IT department would have to calculate the capacity needed, then request a server for it and install and configure the integration software on it. Finally, this environment needs to be monitored and the logs written away. Cloud services take care of all these things.

CI/CD

Continuous Delivery (CD) is about being able to make deliveries faster, more often and more predictably. The key word here is automating parts of the production line. With a factory you might think of robotization, with software (such as integrations) it is about automation, more specifically automation of building software package, automated testing and automated deployment. Often using scripts (build scripts, test scripts and deploy scripts), which you run in a pipeline sequentially and through a CI/CD server (such as Jenkins or Bamboo).

Low code

Many traditional organizations assume an IDE in which the integration is developed locally. Sometimes the integration is written out entirely in code and tested locally. Through version control/continuous delivery, it is then deployed on a server. With low code, you are actually already working directly in a test environment and not in code, but as much as possible graphically. The business analyst designs and builds the integration all at once.