Data Integration Language

Part 1: Concepts and Structure

DIL stands for Data Integration Language. It’s an effort to create an open and useful language in the integration domain. The language transpiles to Apache Camel.

DIL logo

Goal

The goal of DIL is to make the job of writing data integrations easier. It does this by focusing on the data flow, instead of the control flow, as most general-purpose languages do.

Basic concepts

DIL has three basic concepts:

  1. Data
  2. Components
  3. Links

These three concepts are inspired by flow-based programming. All other concepts are derived from these three concepts.

1. Data

In computing, data often means simply a fact that has a certain form (an image, a record, a text, an object and so on). DIL has a narrower definition of data, namely data as a data message.

More on the concept of data messages:

2. Components

Components can be seen as software blocks with which you can build a solution. A component encapsulates a set of functions that processes data.

Component A

3. Links

Components can be linked with each other:

Structure

In this chapter, we define the structure of a DIL program.

Component levels

A component may contain one or more subcomponents on a lower level.

As seen in the above diagram, components are nested. The component that contains all other components is on the first level, while the subcomponents are on the second level, and so on.

Links

Links tie a DIL program together. They are a key concept to the language. We will talk about many facets that links have. We start with the following diagram:

In the above diagram, the components “A,B,C” have outbound links. “E,F,G” have inbound links. D have multiple inbound and outbound links. The kind of link is defined by its parameters.

A link can have the following parameters:

  1. ID

The identifier of the link.

2. Transport

The type of transport, which can be either synchronous or asynchronous (or even a queue on an external broker).

3. Bound

A bound gives the direction of the data. Inbound means it goes into the component, and outbound that it goes out of the component to another component.

4. Point:

A point is basically the place of a component in a flow. A start point starts a flow, a mediate point is in the middle, and an endpoint ends a flow.

5. Format

A format is a file format like XML, JSON, CSV and YAML. The format parameter on links sets which format can go in or out. By setting it explicitly, the component is ensured that the correct format is consumed.

6. Pattern

The exchange pattern, which is either an event (InOnly) or request-reply (InOut).

7. Rule

Optional rule to route the message by an expression.

XML Example of a component with a link:

<components>
<component>
<id>A</id>
<links>
<link>
<id>1</id>
<transport>sync</transport>
<bound>out</bound>
<format>json</format>
<pattern>event</pattern>
</link>
</links>
</component>
<component>
<id>B</id>
<links>
<link>
<id>1</id>
<transport>sync</transport>
<bound>in</bound>
<format>json</format>
</link>
</links>
</component>
</components>

Link Levels

A link links components by default on the same level. So to say horizontally:

It’s also possible to link components vertically. In the following example, component C (which is a level below X) links to component Y on a higher level:

Levels

Besides links, DIL also has levels. Component links and levels together make the structure of a DIL program:

Each level in DIL is named differently and has its own role in a DIL program. We define 5 levels:

1. Integration (first level component or root component)

2. Flow (second level component)

3. Step (third level component)

4. Block (fourth level component)

5. Core (fifth level or atomic component)

In a diagram:

Why are there multiple levels? We need them because sometimes you need low-level (technical) and other times high-level (business) implementations. A lower level provides the building blocks for a higher level. Conversely, if you miss a building block on a higher level, you can go a level deeper to implement it.

Based on what level you are, DIL thus targets both business developers as programmers.

Roles

We are now discussing the role of every component on the various levels.

1. Integration

An integration is a component on the root level. Integration as a concept has nothing to do with a specific programming language, an application or a specific technology. It revolves around a set of data. For example, “orders” or “employees”.

2. Flow

A flow is a higher-level component that describes the flow of the integration. For example, “order from A to B”. A flow is a component that defines a specific integration function and consists of a series of steps.

Flow types

There are four types of flows:

  1. Inbound flow (a flow with no flow before)
  2. Mediation flow (a flow with a flow before and after)
  3. Outbound flow (a flow with a flow before, but not after)
  4. Error flow (a flow that handles errors)

3. Step

Steps are components on a mid-level. A step performs an action on data (in the form of a message).

Step type

There are four types of steps:

  1. Source: Starting point of a flow. Gets data from a source. Has one link on the outbound side.
  2. Action: Performs an action within the flow. This step has one inbound link and one outbound link.
  3. Router: A router step can have an indefinite number of inbound or outbound links. For example, a split router has a normal outbound and an outbound with the splitted data.
  4. Sink: Endpoint of a flow. Normally puts data at a destination. Has one link on the inbound side.

4. Blocks

Blocks are low-level components that together form a step. The types of blocks are based on the core components. Because of this trait, we first discuss core components and then come back to the block concept.

Core

Core components are components on the lowest level. They are, so to say, atomic components which have no links. Core components can only be linked by a reference from a block, as we will see later. Let’s first look at the available core components.

Core types

  1. Message: A data message consists of body, headers, properties, and attachments.
  2. Connection: Set up a connection to an external system.
  3. Component: A technical implementation of a protocol or technology. For example, SFTP or AWS S3. Think of a Camel component.
  4. Route: The control of flow on Camel level
  5. Route Configuration: The configuration of a route, mainly for exception and error handling.
  6. Route Template: a parameterized route

Note that most of the constructs are also Camel constructs.

Blocks as building blocks

In the game Minecraft there are blocks to make something, but we can also use commands, or even Java to write our custom blocks. On this analogy, DIL blocks are defined.

A block is related to Camel’s route templates. It’s so to say, a route template on steroids. Blocks not only use parameterized routes, but also other core components like route configuration, connections, messages, environment variables and so on. The core components are reusable and can be used in multiple blocks.

To summarize: Blocks are assembled from core components. Together, these blocks form a step.

The building

The goal of levels and links is that based on your role in an organization, you can write solutions on your own level. Integrations and flows are high level components that are written by a business developer, while core components and blocks are written by programmers:

Next part:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store