Meta works with NVIDIA to build AI research supercomputer

January 24, 2022 – Meta Platforms gave a big thumbs up to NVIDIA, choosing the technologies for what it believes will be its most powerful research system to date.

The AI Research SuperCluster (RSC), announced , is already training new models to advance AI. Once fully deployed, Meta’s RSC is expected to be one of the largest customer installation of NVIDIA DGX A100 systems.

“We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they could seamlessly collaborate on a research project or play an AR game together,” the company says in a blog.

Training AI’s models

When RSC is fully built out, later this year, Meta aims to use it to train AI models with more than a trillion parameters. That could advance fields such as natural-language processing for jobs like identifying harmful content in real time. In addition to performance at scale, Meta cited extreme reliability, security, privacy and the flexibility to handle “a wide range of AI models” as its key criteria for RSC.

Meta’s AI Research SuperCluster features hundreds of NVIDIA DGX systems linked on an NVIDIA Quantum InfiniBand network to accelerate the work of its AI research teams.

Under the hood

The new AI supercomputer currently uses 760 NVIDIA DGX A100 systems as its compute nodes. They pack a total of 6,080 NVIDIA A100 GPUs linked on an NVIDIA Quantum 200Gb/s InfiniBand network to deliver 1,895 petaflops of TF32 performance.

Despite challenges from COVID-19, RSC took just 18 months to go from an idea on paper to a working AI supercomputer thanks in part to the NVIDIA DGX A100 technology at the foundation of Meta RSC.

Penguin Computing is our NVIDIA Partner Network delivery partner for RSC. In addition to the 760 DGX A100 systems and InfiniBand networking, Penguin provided managed services and AI-optimised infrastructure for Meta comprised of 46 petabytes of cache storage with its Altus systems. Pure Storage FlashBlade and FlashArray//C provide the highly performant and scalable all-flash storage capabilities needed to power RSC.

20x performance gains

It’s the second time Meta has picked NVIDIA technologies as the base for its research infrastructure. In 2017, Meta built the first generation of this infrastructure for AI research with 22,000 NVIDIA V100 Tensor Core GPUs that handles 35,000 AI training jobs a day.

Meta’s early benchmarks showed RSC can train large NLP models 3x faster and run computer vision jobs 20x faster than the prior system.

In a second phase later this year, RSC will expand to 16,000 GPUs that Meta believes will deliver a whopping 5 exaflops of mixed precision AI performance. And Meta aims to expand RSC’s storage system to deliver up to an exabyte of data at 16 terabytes per second.

A scalable architecture

NVIDIA AI technologies are available to enterprises of any size. NVIDIA DGX, which includes a full stack of NVIDIA AI software, scales easily from a single system to a DGX SuperPOD running on-premises or at a colocation provider. Customers can also rent DGX systems through NVIDIA DGX Foundry.

Comment on this article below or via Twitter: @IoTNow_OR @jcIoTnow

RECENT ARTICLES

Advanced IIoT gateways with Azure IoT edge integration for remote management at unmanned sites

Posted on: July 6, 2022

When embracing the new era of the Industrial Internet of Things (IIoT), many system integrators and engineers face the critical challenge of finding a secure and reliable IIoT-gateway solution that offers regular security patches to remedy system vulnerabilities in a timely manner. Moxa’s newly launched AIG-300 Series IIoT gateways come with Azure IoT Edge integration

Read more

Aeris Intelligent IoT network provides System Loco with reliable connectivity for worldwide track and trace of sustainable IoT-enabled smart pallets

Posted on: July 5, 2022

Aeris, the global Internet of Things (IoT) solutions provider, announced that System Loco, a global provider of supply chain and asset tracking solutions, has selected the Aeris Intelligent IoT Network to provide next generation connectivity to support and manage the worldwide track and trace of smart pallets employed by System Loco’s customers throughout the world.

Read more
FEATURED IoT STORIES

9 IoT applications that will change everything

Posted on: September 1, 2021

Whether you are a future-minded CEO, tech-driven CEO or IT leader, you’ve come across the term IoT before. It’s often used alongside superlatives regarding how it will revolutionize the way you work, play, and live. But is it just another buzzword, or is it the as-promised technological holy grail? The truth is that Internet of

Read more

Which IoT Platform 2021? IoT Now Enterprise Buyers’ Guide

Posted on: August 30, 2021

There are several different parts in a complete IoT solution, all of which must work together to get the result needed, write IoT Now Enterprise Buyers’ Guide – Which IoT Platform 2021? authors Robin Duke-Woolley, the CEO and Bill Ingle, a senior analyst, at Beecham Research. Figure 1 shows these parts and, although not all

Read more

CAT-M1 vs NB-IoT – examining the real differences

Posted on: June 21, 2021

As industry players look to provide the next generation of IoT connectivity, two different standards have emerged under release 13 of 3GPP – CAT-M1 and NB-IoT.

Read more

IoT and home automation: What does the future hold?

Posted on: June 10, 2020

Once a dream, home automation using iot is slowly but steadily becoming a part of daily lives around the world. In fact, it is believed that the global market for smart home automation will reach $40 billion by 2020.

Read more

5 challenges still facing the Internet of Things

Posted on: June 3, 2020

The Internet of Things (IoT) has quickly become a huge part of how people live, communicate and do business. All around the world, web-enabled devices are turning our world into a more switched-on place to live.

Read more

What is IoT?

Posted on: July 7, 2019

What is IoT Data as a new oil IoT connectivity What is IoT video So what’s IoT? The phrase ‘Internet of Things’ (IoT) is officially everywhere. It constantly shows up in my Google news feed, the weekend tech supplements are waxing lyrical about it and the volume of marketing emails I receive advertising ‘smart, connected

Read more
IoT Newsletter

Join the IoT Now online community for FREE, to receive: Exclusive offers for entry to all the IoT events that matter, round the world

Free access to a huge selection of the latest IoT analyst reports and industry whitepapers

The latest IoT news, as it breaks, to your inbox