AI in data integration
Developers workflows are steadily improved. Better tool chains, low-code tools and even AI assist while coding. As a data integrator we reap the benefits of these developments. But I haven’t seen any AI that wrote a complete interface yet. Is this possible? And how would it look like?
Tooling in integration
The most generically available developer tools today targets at web or mobile applications. Data integration doesn’t even have a graphical interface or a database. So it needs its own tools. Specialized tools add high value to the developer workflow.
In a sense ESB platforms were some of the earliest low-code platforms. They provide special tools and flow editors. Example of ESB platforms are Tibco, Fusion and Sonic.
Common developer tools these platforms use are:
Flow editors: Editors that create a flow with the processing steps of a message:
Mapping editors: Editing between various formats:
API designer: Designing and managing API’s
Online data integration
In traditional workflows for data integration the build phase is mostly done in Eclipse. This build phase is separate from runtime. More modern platforms like Apache NiFi or Dell Boomi bring building and running closer together.
They do this with an online and modeling approach. You create integrations in the browser and directly run them (it may run somewhere else).
AI in data integration
Some tooling of traditional ESB and the new iPaas vendors are very sophisticated. Still, they can’t write code for you. In a lot of ways they are only usable by integration specialists. You need knowledge of protocols, data formats and integration patterns. Can we do the same without expert knowledge?
How would an AI solution for data integration look like?
The first thing what will be different is not to start with a functional and technical design, but with intent. For example, we have a web shop and a warehouse system. Our intention is to integrate both with each other. We ask the AI: “Create an order integration from the web shop to the warehouse system”
For this the AI must know where these systems are and how to gain access to them. It may request this at the supplier of those systems. As soon as there is access, the AI explores the available endpoints. For REST endpoints with an OpenAPI (Swagger) specification. Then it will know what valid endpoints are there on one side. Accordingly, it does the same on the other side.
Next it must rely on the input of the language models and its parameters. Say in one system is related to “EmployeeID” in another system and can map such data. Based on “get” statements it gets sample of both sides. So now it has a functional mapping.
This functional bridge can be turned into code. Say for example we use Apache Camel as an integration framework. There we can feed the AI with code on the internet as may well as our current repository. At the end we let it test itself until a successful interface is the result.
Currently, I don’t think AI is up to the level it can write complete data integrations. AI producing code is more on the level of Computer Aided Translation. Even the best AI powered translators like Google Translate of Deepl.com need to have some fine-tuning by a professional to get it on an acceptable level for usage. But it’s clear that it’s now time to experiment, and we may be surprised how far we can get. But we can’t close our StackOverflow tab just yet.