linkedin

Apache Kafka-MuleSoft Integration

Introduction

Apache Kafka is an open-source event streaming platform originally developed by LinkedIn. It is written in Scala and Java and follows a publish-subscribe messaging model over the TCP network protocol. As a result, Kafka delivers high throughput, fault tolerance, and strong performance for real-time data pipelines. In this blog, we therefore explore MuleSoft Kafka integration and explain how to establish seamless communication between a Mule application and Apache Kafka for scalable, event-driven architectures delivered through MuleSoft integration services.

Kafka Streaming Architecture

Kafka architecture consists of producers, consumers, topics, and partitions, which together enable high-volume event streaming.

A producer publishes messages to a Kafka topic. Meanwhile, a consumer reads messages from that topic for downstream processing.

Topics store streams of events or records. A topic can have multiple producers and multiple consumers at the same time.

To improve scalability, Kafka divides each topic into multiple partitions. These partitions distribute data across brokers and allow applications to read and write in parallel. When Kafka receives a new event, it appends the message to one partition and assigns it a unique offset instead of a message ID.

Common Kafka Use Cases

Kafka supports a wide range of real-time and data-driven use cases.

Organizations use Kafka to track website activity and user behavior. In addition, teams rely on Kafka to manage operational metrics and aggregate logs from distributed systems. Moreover, Kafka enables real-time stream processing for analytics and monitoring workloads.

Advantages of Kafka Over Traditional Messaging Systems

Kafka offers several advantages compared to traditional messaging platforms such as ActiveMQ and RabbitMQ.

First, Kafka operates as a distributed processing system capable of handling massive data volumes. As a result, it scales horizontally with minimal performance impact.

Second, Kafka delivers high throughput, often two to four times greater than traditional messaging systems. Furthermore, performance remains consistent even as the number of consumers increases.

Kafka also allows consumers to reprocess messages by resetting offsets. Additionally, Kafka replicates data across multiple brokers, which improves reliability and fault tolerance. Since Kafka persists messages based on configurable retention policies, systems can recover data even after failures.

Local Kafka Setup

Download Kafka

For Windows environments, download Kafka from the official website and follow the installation guide.

https://kafka.apache.org/downloads

Kafka Installation on macOS

Kafka requires Java 8 and relies on Apache ZooKeeper to manage cluster metadata. However, Kafka includes a built-in ZooKeeper, so no separate installation is required.

To install Kafka using Homebrew, run the following command:

brew install kafka

Kafka Configuration Updates

Before starting Kafka, update the server configuration to avoid connection issues.

Navigate to:

/usr/local/etc/kafka/server.properties

Update the listener configuration from:

listeners=PLAINTEXT://:9092

to:

listeners=PLAINTEXT://localhost:9092

Starting Kafka Services

First, start ZooKeeper:

zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties

Next, start the Kafka broker:

kafka-server-start /usr/local/etc/kafka/server.properties

Creating a Kafka Topic

Kafka producers publish messages to topics, while consumers subscribe to them.

Create a topic using the following command:

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

Here, test represents the topic name.

Producer and Consumer Consoles

First, create a producer console:

kafka-console-producer --broker-list localhost:9092 --topic test

Next, create a consumer console:

kafka-console-consumer --bootstrap-server localhost:9092 --topic test --from-beginning

When the producer publishes messages, the consumer displays them in real time.

Kafka Integration with Mule Applications

The same message publishing and consumption workflow can be implemented using MuleSoft Kafka integration through the Mule Kafka Connector.

Multi-Broker Kafka Cluster Setup

Previously, the setup included a single broker. Now, add another broker to form a cluster.

Copy the existing configuration file:

/usr/local/etc/kafka/server.properties

Create a new file named:

/usr/local/etc/kafka/server-one.properties

Update the following values:

broker.id=1
listeners=PLAINTEXT://localhost:9093
log.dirs=/usr/local/var/lib/kafka-server1-logs

Since both brokers run on the same machine, changing the port and log directory prevents conflicts.

Start the second broker:

kafka-server-start /usr/local/etc/kafka/server-one.properties

Cluster Topic Configuration

Create a new topic with multiple partitions and replication enabled:

kafka-topics --create --zookeeper localhost:2181 --replication-factor 2 --partitions 2 --topic cluster-topic

To verify broker assignments, run:

kafka-topics --describe --zookeeper localhost:2181 --topic cluster-topic

Testing Cluster Message Consumption

First, create a producer:

kafka-console-producer --broker-list localhost:9092 --topic cluster-topic

Next, start two consumers on different brokers:

kafka-console-consumer --bootstrap-server localhost:9092 --topic cluster-topic
kafka-console-consumer --bootstrap-server localhost:9093 --topic cluster-topic

Messages published by the producer appear in both consumer consoles.

Kafka Connector for Mule Applications

The MuleSoft Kafka Connector provides ready-to-use components that enable Mule applications to interact with Kafka topics seamlessly.

Supported Operations

The connector supports publishing messages to Kafka topics with transaction support. In addition, it allows applications to consume messages, listen to events, commit offsets, and reset offsets using seek operations.

Configuring Kafka Connectivity in Anypoint Studio

First, create a Mule project named kafka-poc. Next, import the Apache Kafka Connector from Exchange.

Then, configure the Kafka producer and consumer settings in the global configuration file. Specify the bootstrap servers that represent the Kafka cluster.

Mule Message Flows

Create two Mule flows.

The first flow publishes messages to a Kafka topic. For example, it can read data from a CSV file and publish records as events.

The second flow consumes messages from Kafka using a Message Listener and processes them in real time.

Reprocessing Kafka Events Using Offset Control

Kafka allows consumers to reprocess messages by resetting offsets.

First, check the current consumer offset:

kafka-consumer-groups --describe --group cluster-consumer-group --bootstrap-server localhost:9092

Next, update the consumer configuration by setting the auto offset reset value to earliest.

After deployment, invoke the Mule endpoint to re-consume messages from the selected offset.

Real-World Kafka Implementations

Kafka powers event streaming for many large organizations.

Adidas uses Kafka for real-time data streaming. Airbnb relies on Kafka for event tracking and exception handling. Tinder uses Kafka for recommendations and notifications. Uber processes driver matching and ETA calculations through Kafka. Netflix handles over one trillion messages per day for real-time monitoring. LinkedIn uses Kafka for activity streams and operational metrics. Foursquare applies Kafka for online messaging.

Conclusion

This walkthrough demonstrates how to implement MuleSoft Kafka integration by publishing and consuming events across single-broker and multi-broker Kafka clusters. By using the MuleSoft Kafka Connector, teams can build scalable, fault-tolerant, event-driven integrations that support modern streaming architectures and real-time data processing.

References

https://kafka.apache.org/documentation/
https://techbeacon.com/app-dev-testing/what-apache-kafka-why-it-so-popular-should-you-use-it
https://docs.mulesoft.com/kafka-connector/4.4/kafka-connector-examples
https://kafka.apache.org/powered-by