Future-proofing IoT architectures for fast data processing

Vladimir Schreiner, Product manager, Hazelcast

Technology market researchers forecast that by 2020 connected devices and things will exceed 20 billion. The data volume generated by this mass will dwarf the current big data produced primarily by social networks.

Therefore, organisations need to ensure that future data pipelines are powerful enough to handle and process this predicted increase in telemetry data, yet stay flexible enough to react to changing business needs. To this end, there are a number of important considerations…

Flexible architecture

The main goal of device-ready architecture is to provide a new layer which binds traditional enterprise systems with devices and positions data as a first-class citizen, says Vladimir Schreiner, Product manager at Hazelcast.

Using this architecture, data from devices comes in the form of a continuous stream of individual events. Events hold the telemetry information (measurements, status updates, alerts) and the data pipeline transports those events, ensures authorisation and filtering, processes the events and provides processed data to enterprise systems.

Volume in real time

One of the key challenges of event stream pipelines is to handle speed at scale. Solutions must be capable of continuously processing big data volumes, while ensuring low end-to-end latency.

Data flowing in Internet of Things (IoT) systems are generally latency-sensitive, meaning it loses value with time meaning it needs to be processed as soon as possible. Data analysts cannot report serious problems in hours, or even in minutes. Alerts needs to be immediate.

To achieve this near real-time operation, streaming data must be processed on-the-fly (prior to storage). On the stream ingestion level, this mostly equates to:

    • Device and event authorisation and filtering to ensure security
    • Device status updates
    • Message format translation/conversion/normalisation

Pre-processed events are routed for downstream processing which means the stream processing unit must perform on-the-fly analysis and anomaly detection, which typically means:

    • Search for complex events to detect anomalies and trigger alerts
    • Compute stats
    • Match streaming data against machine-trained model

To achieve all of this, a powerful solution is necessary. The stream processing engine must be able to handle large loads with flow peaks (the cluster has to be elastic to scale with current load) and guarantee low-latency.

Fast data access and latency

It is not just speed and latency of a processing unit which matters. Low latency must be provided end-to-end i.e. starting from the event emitting by the device to the response. The messaging and storage involved in the pipeline may become a bottleneck if not designed properly.

Therefore, when storage and processing meet, the former must be engineered carefully and have the following functionality:

    • New datasets entering the processing, such as device registry or enrichment data store
    • Operational storage for publishing and the updating of results
    • Messaging tool for passing data between units of data pipeline

To achieve all of this, an In-Memory Data Grid (IMDG) is a good fit as it is inherently quick and solves storage issues by forming storage clusters, in addition it can transmit reactive access patterns to notify analysts when values change. Therefore, it can be used as a cache for big datasets during processing, while forming in-memory data lakes for frequently used data.

Saving bandwidth to reduce costs

Another idea is to reduce the data flow between the field and data centre via edge processing i.e. processing close to the data source. Data streams can be aggregated and filtered prior to being sent to the data centre, thus reducing transmission costs. When you consider data streams of gigabytes per second, the gained flow reduction would be significant. Importantly, stream processing technology must be lightweight.

Introducing Hazelcast Jet – a potential solution

Simple to deploy, Hazelcast Jet is an in-memory streaming and fast batch processing engine and a building block for IoT data pipelines. It is used to process big data volumes with low latency. Flexible enough to conduct edge processing, server-side stream processing, and subsequent data-at-rest batch processing, it comes integrated with a scalable, in-memory storage layer – Hazelcast IMDG.

Hazelcast Jet is appropriate for applications that require a near real-time experience such as sensor updates in IoT architectures (house thermostats, lighting systems), in-store e-commerce systems and social media platforms.

The author of this blog is Vladimir Schreiner, Product manager at Hazelcast

Comment on this article below or via Twitter: @IoTNow OR @jcIoTnow

Recent Articles

What private blockchain means to telcos

Posted on: August 2, 2021

While the telecoms sector is yet to fully embrace blockchain technology, Jonas Lundqvist, CEO at Haidrun, says that the arrival of private blockchain may change that and unlock a massive potential for telecom service providers to create new businesses and revenue streams.

Read more

The Chinese market is expected to lead the development of V2X

Posted on: August 2, 2021

According to a new research report from the IoT analyst firm Berg Insight, there were about 0.7 million cars with V2X capabilities on the roads at the end of 2020. This number is expected to grow to 35.1 million by 2025. Communications between vehicles has been discussed for more than two decades, but with few

Read more