Hyperscaling storage to meet the demands of AI

The complex debate on how to regulate the use of AI to benefit society and prevent its misuse is heating up as governments around the world grapple with its far-reaching implications. In the meantime, AI and ML tools are already becoming an integral part of our daily lives and are set to become even more pervasive, says Tim Sherbak, enterprise products and solutions marketing at Quantum.

Reassuringly, there are many examples of their capabilities being developed in positive ways, such as medical science harnessing the benefits to help in the detection of cancer, banks and credit card companies deploying solutions to prevent fraud and scams, and the construction industry evaluating how they could be used to deliver a quicker, more efficient construction design process.

Across industries the use of AI is expanding rapidly to process and utilise data created in different formats, as well as automate tasks, detect anomalies, and generate new content and ideas. However, delivering these outcomes successfully requires a huge amount of source data, and that’s where the storage problem starts and it doesn’t end there.

Why the AI storage problem is growing

To begin with, vast sets of raw data need to be gathered to build AI and ML applications. But there are further challenges posed by the type of data that’s being utilised, primarily because most of it is unstructured, such as documents, web pages, social media posts, emails, audio recordings, videos, and images. This form of data is much larger compared to structured data, the type that is commonly stored in databases and archives.

Next, the raw data is processed to change it into a format that algorithms work on to train the model. Overall effectiveness depends on the quantity and quality of the original data, the algorithm design, and constant refinement from feedback and updated data. The final AI model encapsulates all the knowledge gained during training, which can range from simple rule-based systems to sophisticated neural networks.

It all adds up to a massive volume of data, potentially petabytes, which is growing continuously as new data is collected. And it may need to be stored for decades, or longer, especially if the data set is required in the future to train completely new AI models.

The era of unstructured data

To put the problem in perspective, over 80% of data created in the last two years is unstructured and it is growing at a phenomenal rate. Analysts predict that, in the next five years, double the amount data will be created than in the past ten years and over 80 % will continue to be unstructured. That adds up to zettabytes (trillions of gigabytes) of data to manage!

Today, organisations are being forced to make important decisions about how much data to keep without knowing what they will need in the future. Consequently, many are taking the approach of trying to keep everything and are faced with the issue of finding a storage solution that is not only affordable but also makes it simple to retrieve the data.

Adding fuel to the fire, AI applications put huge demands on storage system performance. Processing these massive unstructured datasets requires extremely low latencies and high performance, which legacy storage systems were never built to manage, and certainly not with the strong consistency required for AI.

Planning for performance and accessibility

As it stands, most of the world’s data is stored on hard disk-based systems that were developed over twenty years ago. These systems were conceived when the concept of storing exabytes of unstructured data for decades was not even contemplated. At that time, data was mostly in structured formats and was generally being archived for compliance and legal purposes instead of its intrinsic value. Historically, this kind of retained data required little additional processing and could be kept in long-term, slow performing storage. Now, businesses across the board want to keep and make their data easily accessible and searchable, with the expectation that it will be needed again to be re-processed, re-trained, or monetised in new ways.

Making that act of retrieval tougher still, many enterprises have information spread across multiple systems, divided between the cloud and on-premises. They often don’t know exactly what they have in their archives or if they are holding copies of the same data in numerous places. Additionally, new data may be generated outside their data centres by applications or objects, like physical cameras, and moved elsewhere to be processed. Therefore, the management of data across its lifecycle as it moves from one place to another has to be accommodated too. So, storage solutions must be extremely flexible to meet all of these requirements and operate in the cloud or on-premises.

It’s an impossible ask for legacy storage systems which were never designed for this amount or type of data and weren’t built to scale to such a degree. The classic network-attached storage and object storage architectures will fall over if they try to hyperscale. Alternatives incorporating RAID and replication have similar issues as they can’t provide sufficient protection from failure, or storage efficiency. Also, managing multiple storage devices and separate tiers of storage eats up valuable administration time.

Scaling for an AI-driven world

The question for many organisations is how to keep ever-growing quantities of valuable data protected and available over the long term without busting their storage budget? The answer is that it’s time to hyperscale with affordable cloud-native solutions that were designed from the ground up to provide high performance solutions for an AI-driven society.

Tim Sherbak

These solutions have been developed with massive scale-out architectures combining flash storage and RDMA (remote direct memory access) networking. Their very low latency and higher throughput in data-intensive workloads, such as AI and ML, result in super-fast application performance and responsiveness. This also enables efficient data sharing and synchronisation across multiple systems, ideal for distributed and hybrid cloud and on-premises environments.

Another game-changer is that hyperscaling isn’t just about capacity. The latest innovative storage technologies also enable far quicker, accurate searching and retrieval, by automating data annotation and classification, as well as managing deduplication across different systems.

Businesses are increasingly seeing untapped potential in the information and intelligence that they have created but are struggling to store it effectively. Modern storage technology will bring new levels of automation, performance, security, and flexibility that will unlock far greater value from AI and ML data sets, without the past constraints and burgeoning costs of outdated hardware.

The author is Tim Sherbak, enterprise products and solutions marketing at Quantum.

RECENT ARTICLES

Carson City upgrades to Iteris’ advanced Vantage Apex sensors

Posted on: April 26, 2024

Iteris has announced that Carson City, Nevada has chosen to upgrade the city’s intersection detection sensors to Iteris’ Vantage Apex hybrid sensors.

Read more

Make the Intelligent Choice: Embed X103 in Smart City Outdoor Devices

Posted on: April 25, 2024

The adage “less is more” is the current state of digital transformation, starting with existing technology that has already proven successful – and then further adapting and streamlining. The “smart

Read more
FEATURED IoT STORIES

What is IoT? A Beginner’s Guide

Posted on: April 5, 2023

What is IoT? IoT, or the Internet of Things, refers to the connection of everyday objects, or “things,” to the internet, allowing them to collect, transmit, and share data. This

Read more

The IoT Adoption Boom – Everything You Need to Know

Posted on: September 28, 2022

In an age when we seem to go through technology boom after technology boom, it’s hard to imagine one sticking out. However, IoT adoption, or the Internet of Things adoption,

Read more

9 IoT applications that will change everything

Posted on: September 1, 2021

Whether you are a future-minded CEO, tech-driven CEO or IT leader, you’ve come across the term IoT before. It’s often used alongside superlatives regarding how it will revolutionize the way

Read more

Which IoT Platform 2021? IoT Now Enterprise Buyers’ Guide

Posted on: August 30, 2021

There are several different parts in a complete IoT solution, all of which must work together to get the result needed, write IoT Now Enterprise Buyers’ Guide – Which IoT

Read more

CAT-M1 vs NB-IoT – examining the real differences

Posted on: June 21, 2021

As industry players look to provide the next generation of IoT connectivity, two different standards have emerged under release 13 of 3GPP – CAT-M1 and NB-IoT.

Read more

IoT and home automation: What does the future hold?

Posted on: June 10, 2020

Once a dream, home automation using iot is slowly but steadily becoming a part of daily lives around the world. In fact, it is believed that the global market for

Read more

5 challenges still facing the Internet of Things

Posted on: June 3, 2020

The Internet of Things (IoT) has quickly become a huge part of how people live, communicate and do business. All around the world, web-enabled devices are turning our world into

Read more