BTS.id

Real-Time Data Processing Architecture Part. 2

In the previous article, we have already talked about big data, real-time data processing, and microservice architecture for big data. In this article, we will dig into the messaging system deeper, and see how messaging queue makes our life easier for certain scenarios in microservices architecture.

Message Queuing with Message Brokers

In a more complex scheme when your application is not only taking orders from customers, but also need to verifies inventory and ships them, a message queuing or message brokers is needed to facilitate the communication between services. A message queuing allows applications to send messages to each other by providing an asynchronous communications protocol that puts a message onto a queue where immediate response to continue processing is not required.

The basic architecture of a message queue is simple. Producers, which are client applications will create messages and deliver them to the message queue. Later, an other application called consumers will connect to the queue and get the messages to be processed. While waiting till consumers retrieve them, messages placed onto the queue will be stored.

For example, let’s imagine you are developing an application for ticket-booking. When users book a ticket, booking service will take the request, process the booking data, and give feedback that is usually a booking code complete with its details, a scheme that is manageable using monolithic architecture. But in a more complex scheme where users book more than 1 tickets and demand additional features such as send booking details to email or send push notifications to messages, users will have to wait until all the processes are done, with no guarantee that it will be succeed.

To solve the problem above, an additional service called a message broker that allow data to be exchanged in a simple and reliable manner is needed. A message broker like Apache Kafka is responsible for gathering, routing, and distributing messages from senders to the right receiver. And it provides better ordering guarantees than a traditional message brokers.

According to Wikipedia, a message broker “translates a message from the formal messaging protocol of the sender to the formal messaging protocol of the receiver.”

Apache Kafka

Apache Kafka is a message broker that is often used in real-time streaming data architecture to provide real-time analytic and is often used for real-time streams of data, to collect big data, or to do real-time analysis. According to co-creator Jay Kreps, it is built to be fault-tolerant, high throughput, horizontally scalable, and allows geographically distributing data streams and stream processing applications.

It is not only popular, but also rising rapidly. It is used by a lot of companies who handle a lot of data such as Linkedin, Twitter, Square, Uber, Paypal, Cisco, eBay, Pinterest, and Netflix. Apache Kafka was developed around 2010 at Linkedin to solve the problem of integrating huge amounts of data from their website and infrastructure into a Lambda architecture that harnessed Hadoop and real-time processing system.

At that time, there weren’t any solutions for this type of ingress for real-time applications, and traditional message brokers did not satisfy Linkedin needs. They were too heavy and slow, and were not designed for real-time use case as well.

The Engineering team at Linkedin then developed a scalable and fault-tolerant messaging system without any of the unnecessary parts, which has quickly transformed into a full-fledged event streaming platform that is Apache Kafka.

Why Apache Kafka is so Popular?

Apache Kafka gains immense popularity for its simplicity and compatibility. It is easy to set up and use. It is fast, scalable, durable, and fault-tolerant. It can be used with a wide range of system. And it also has a great performance as it works well with systems that have data streams to process and enables those systems to aggregate, transform, and load into other stores.

Unlike other message brokers, Kafka combines the function of messaging, storage, and processing, makes it a powerful event streaming platform capable of handling trillions of messages a day. Kafka can both store and process historical data from the past and for real time work, therefore suitable for developing streaming applications as well as for streaming data pipelines.

How Apache Kafka Works

Storing 1000 data from many data sources such as JSON, XML, etc to one database might be okay, but the question that often arises in today’s world where big data is considered important, what if there is one million data need to be stored? And what if each data source has ten million data?

How long will it take till everything stored without creating data burst?

Apache Kafka works by combining Queuing and Publish-subscribe which are the two main patterns of messaging. Each with its own pros and cons–Queuing provides the opportunity to easily scale the processing, but aren’t multi-subscriber, while Publish-subscribe gives the possibility to broadcast data to multiple consumer groups, but it is difficult to scale–but with Apache Kafka, benefits from both data processing patterns can be obtained.

Apache Kafka also works as a middle layer to decouple your real-time data pipelines. It acts as a middleware to prevent database being directly bombarded. Messages will be sent to the middle layer and pulled by microservice as fast as possible before being inserted to the database.

As shown in the picture above, data source is being sent to Redis. From there, some are directed to notification, and some to Kafka. Kafka then process the data into a more organized form, before storing it in Hadoop Distributed File System (HDFS)

Kafka can be used to feed fast lane systems and it can be used to feed events to CEP (Complex Event Streaming System) and IoT/IFTTT-style automation system. It is also used to stream data for batch data analysis. It feeds Hadoop and streams data for into your Big Data platform for future data analysis.

Build software for your business with the best practice and latest technology with BTS.id (Bridge Technology Services).

Contact us:
Telp : (+62 22) 6614726
Email : info@bts.id

Real-Time Data Processing Architecture Part. 1

In the previous article, we have already talked about big data, real-time data processing, and microservice architecture for big data. In this article, we will dig into the messaging system deeper, and see how messaging queue makes our life easier for certain scenarios in microservices architecture. Message Queuing with Message Brokers In a more complex …

Data Analytics for Your Business

In the previous article, we have already talked about big data, real-time data processing, and microservice architecture for big data. In this article, we will dig into the messaging system deeper, and see how messaging queue makes our life easier for certain scenarios in microservices architecture. Message Queuing with Message Brokers In a more complex …