According to the “Cisco Global Cloud Index: Forecast and Methodology, 2015–2020,” the total amount of data created by devices, driven by the Internet of Things (IoT), will reach 600 ZB per year by 2020, up from 145 ZB per year in 2015.
However, for these new IoT applications to deliver the anticipated benefits, they must ingest and simultaneously analyse in real-time massive amounts of data being generated from multiple sources, says Nikita Ivanov, CTO and co-founder of GridGain Systems. For example, consider some of the many digital transformation and omnichannel customer experience initiatives that enterprises and governments are launching:
- Predictive Maintenance – A myriad of sensors on fleets of equipment, such as trucks, airplanes and industrial buildings – track every nuance of current operating conditions. By analysing this data, vendors or owners can act immediately on indicators of potential failure to eliminate or reduce downtime, reduce maintenance costs, and enable equipment to last longer in the field. This will lead to increased ROI and, depending on the use case, increased customer satisfaction.
- Risk Mitigation – To minimise the spread of new loan scams, a bank must be able to access real-time data feeds of loan applications and credit reports and continuously update its model of what indicates a possible loan fraud attempt in order to stop a possible new scam attempt before the transaction is completed.
- Smart Cities – A smart city application like traffic management requires the real-time analysis of data from sources such as vehicle-based routing applications, social media streams, traffic cams, weather stations, police reports, event calendars and more. The resulting analysis of this data must immediately be used to update the existing traffic flow model, so recommendations of optimal routes can be sent to connected vehicles.
Developing these applications on top of disk-based databases is impossible because they rely on an extract, transform and load (ETL) processes to move data from an online transactional processing (OLTP) database into an online analytical processing (OLAP) database. This introduces a time delay that means the data is out of date before the analysis even begins. The answer is an IoT platform built on open source applications for cost-effectively collecting, analysing and managing all the data in real-time.
The role of in-memory computing
A modern in-memory computing (IMC) platform brings together multiple IMC capabilities into a unified experience to simplify deployment and management. However, all the following IMC capabilities are not available in all IMC platforms, so it’s important to understand the application requirements when evaluating solutions.
- In-memory data grids and in-memory databases are deployed on a computing cluster in an on-premises, cloud or hybrid environment. In-memory data grids are easily inserted between the data and application layers of existing applications without ripping and replacing the existing database. In-memory databases are used for new applications or when re-architecting an existing application. In both cases, the IMC cluster is scaled out simply by adding nodes and the entire available memory and CPU power of the cluster is available for processing.
- HTAP (hybrid transactional/analytical processing) or HOAP (hybrid operational/analytical processing) is the ability to use a single database for simultaneous transactions and analytics processing – even while running real-time machine or deep learning algorithms at scale. This architecture significantly reduces the cost and complexity of the system architecture for IoT use cases.
- A memory-centric architecture enables users to balance infrastructure costs and application performance by keeping the full operational data set on disk while keeping only a subset of user-defined data in memory. This architecture, often referred to as “persistent store,” can be part of a distributed ACID and ANSI-99 SQL-compliant disk store deployed on spinning disks, solid state drives (SSDs), Flash, 3D XPoint or other storage-class memory technologies. This architecture also enables immediate data processing following a reboot without waiting for all the data to reload into memory.
- A continuous learning framework can be deployed using integrated, fully distributed machine learning (ML) and deep learning (DL) libraries that have been optimised for massively parallel processing. This enables each ML or DL algorithm to run locally against the data residing in-memory on each node of the IMC cluster. This allows for the continuous updating of data without degrading performance, even at petabyte scale.
Building out the IoT platform with open source solutions
To make it easier to develop and roll out new IoT applications, vendors or open source projects often collaborate to ensure the applications work together seamlessly and are easier to deploy. For example, consider the following open source “IoT Stack”:
- Apache Ignite – An in-memory computing platform for processing data in real time at scale, complete with a persistent store feature and machine learning and deep learning libraries.
- Apache Kafka – A streaming platform for publishing and subscribing to streams of records, storing streams of records in a durable way, and processing streams of records as they occur.
- Apache Spark – A unified analytics engine for large-scale data processing.
- Kubernetes – A system for automating the deployment, scaling and management of containerised applications across a server cluster.
Vendors of solutions built on these open source projects as well as the projects themselves are working to ensure simple, native integration between them. Because these are mature solutions, they offer one of the most cost-effective paths to rolling out large-scale IoT applications.
Successful Internet of Things projects often depend on deploying a cost-effective platform for developing and rolling out applications that can simultaneously collect and analyse data from multiple streaming sources, at massive scale. For many organisations, an IoT platform powered by in-memory computing based on open source solutions is the answer. Today, it’s critical for IT decision makers to begin laying out an IoT infrastructure strategy in order to position their organisations for future success.
The author of this blog is Nikita Ivanov, CTO and co-founder of GridGain Systems
About the author
Nikita Ivanov, founder and CTO of GridGain Systems, has led GridGain in developing advanced and distributed in-memory data processing technologies. Nikita has more than 20 years of experience in software application development, building HPC and middleware platforms and contributing to the efforts of other startups and notable companies, including Adaptec, Visa and BEA Systems. In 1996, he was a pioneer in using Java technology for server-side middleware development while working for one of Europe’s largest system integrators. Nikita is an active member of the Java middleware community and a contributor to the Java specification. He is also a frequent international speaker with more than 50 talks at various developer conferences in the last 5 years.