Meta works with NVIDIA to build AI research supercomputer

January 24, 2022 – Meta Platforms gave a big thumbs up to NVIDIA, choosing the technologies for what it believes will be its most powerful research system to date.

The AI Research SuperCluster (RSC), announced , is already training new models to advance AI. Once fully deployed, Meta’s RSC is expected to be one of the largest customer installation of NVIDIA DGX A100 systems.

“We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they could seamlessly collaborate on a research project or play an AR game together,” the company says in a blog.

Training AI’s models

When RSC is fully built out, later this year, Meta aims to use it to train AI models with more than a trillion parameters. That could advance fields such as natural-language processing for jobs like identifying harmful content in real time. In addition to performance at scale, Meta cited extreme reliability, security, privacy and the flexibility to handle “a wide range of AI models” as its key criteria for RSC.

Meta’s AI Research SuperCluster features hundreds of NVIDIA DGX systems linked on an NVIDIA Quantum InfiniBand network to accelerate the work of its AI research teams.

Under the hood

The new AI supercomputer currently uses 760 NVIDIA DGX A100 systems as its compute nodes. They pack a total of 6,080 NVIDIA A100 GPUs linked on an NVIDIA Quantum 200Gb/s InfiniBand network to deliver 1,895 petaflops of TF32 performance.

Despite challenges from COVID-19, RSC took just 18 months to go from an idea on paper to a working AI supercomputer thanks in part to the NVIDIA DGX A100 technology at the foundation of Meta RSC.

Penguin Computing is our NVIDIA Partner Network delivery partner for RSC. In addition to the 760 DGX A100 systems and InfiniBand networking, Penguin provided managed services and AI-optimised infrastructure for Meta comprised of 46 petabytes of cache storage with its Altus systems. Pure Storage FlashBlade and FlashArray//C provide the highly performant and scalable all-flash storage capabilities needed to power RSC.

20x performance gains

It’s the second time Meta has picked NVIDIA technologies as the base for its research infrastructure. In 2017, Meta built the first generation of this infrastructure for AI research with 22,000 NVIDIA V100 Tensor Core GPUs that handles 35,000 AI training jobs a day.

Meta’s early benchmarks showed RSC can train large NLP models 3x faster and run computer vision jobs 20x faster than the prior system.

In a second phase later this year, RSC will expand to 16,000 GPUs that Meta believes will deliver a whopping 5 exaflops of mixed precision AI performance. And Meta aims to expand RSC’s storage system to deliver up to an exabyte of data at 16 terabytes per second.

A scalable architecture

NVIDIA AI technologies are available to enterprises of any size. NVIDIA DGX, which includes a full stack of NVIDIA AI software, scales easily from a single system to a DGX SuperPOD running on-premises or at a colocation provider. Customers can also rent DGX systems through NVIDIA DGX Foundry.

Comment on this article below or via Twitter: @IoTNow_OR @jcIoTnow

RECENT ARTICLES

Sign-Up For Your Premier One-Stop Guide to the EV Landscape in 2022

Posted on: December 1, 2022

The race is on to establish the market leaders across various verticals in the Electric Vehicle industry, including the vehicles themselves, charging stations, third-party service providers, and the supporting infrastructure. This in-depth EV market guide and report by IoT Now explores all these trends to help determine the roadmap for the current state as well

Read more

Axiomtek launches compact DIN-rail IIOT gateway for data driven energy

Posted on: November 30, 2022

Axiomtek, a world-renowned specialist relentlessly devoted in the research, development, and manufacture of series of innovative and reliable industrial computer products of high efficiency is pleased to announce the ICO120-E3350, an extremely compact industrial IoT gateway powered by the Intel Celeron processor N3350 (codename: Apollo Lake-M). The ruggedised designs feature fanless operation, -40°C to 70°C

Read more
FEATURED IoT STORIES

The IoT Adoption Boom – Everything You Need to Know

Posted on: September 28, 2022

In an age when we seem to go through technology boom after technology boom, it’s hard to imagine one sticking out. However, IoT adoption, or the Internet of Things adoption, is leading the charge to dominate the next decade’s discussion around business IT. Below, we’ll discuss the current boom, what’s driving it, where it’s going,

Read more

9 IoT applications that will change everything

Posted on: September 1, 2021

Whether you are a future-minded CEO, tech-driven CEO or IT leader, you’ve come across the term IoT before. It’s often used alongside superlatives regarding how it will revolutionize the way you work, play, and live. But is it just another buzzword, or is it the as-promised technological holy grail? The truth is that Internet of

Read more

Which IoT Platform 2021? IoT Now Enterprise Buyers’ Guide

Posted on: August 30, 2021

There are several different parts in a complete IoT solution, all of which must work together to get the result needed, write IoT Now Enterprise Buyers’ Guide – Which IoT Platform 2021? authors Robin Duke-Woolley, the CEO and Bill Ingle, a senior analyst, at Beecham Research. Figure 1 shows these parts and, although not all

Read more

CAT-M1 vs NB-IoT – examining the real differences

Posted on: June 21, 2021

As industry players look to provide the next generation of IoT connectivity, two different standards have emerged under release 13 of 3GPP – CAT-M1 and NB-IoT.

Read more

IoT and home automation: What does the future hold?

Posted on: June 10, 2020

Once a dream, home automation using iot is slowly but steadily becoming a part of daily lives around the world. In fact, it is believed that the global market for smart home automation will reach $40 billion by 2020.

Read more

5 challenges still facing the Internet of Things

Posted on: June 3, 2020

The Internet of Things (IoT) has quickly become a huge part of how people live, communicate and do business. All around the world, web-enabled devices are turning our world into a more switched-on place to live.

Read more

What is IoT?

Posted on: July 7, 2019

What is IoT Data as a new oil IoT connectivity What is IoT video So what’s IoT? The phrase ‘Internet of Things’ (IoT) is officially everywhere. It constantly shows up in my Google news feed, the weekend tech supplements are waxing lyrical about it and the volume of marketing emails I receive advertising ‘smart, connected

Read more