Table of Contents
What is generative AI and why it needs neural networks?
Generative AI is the cutting-edge field of machine learning, where computer systems can produce novel content such as text, images, music, and code. To achieve this, they need to train huge neural networks that can learn from massive amounts of data. But how do they do it, and who are the leaders in this domain?
Computational models that mimic the structure and function of biological neurons are known as neural networks. They consist of layers of interconnected units that can process and transmit information. Neural networks can learn from data by adjusting their weights and biases through a process called backpropagation.
Generative AI uses neural networks to model the probability distribution of the data, and then sample from it to generate new data. For example, a generative AI system can learn the patterns and rules of natural language from a large corpus of text, and then generate new sentences or paragraphs that follow the same style and logic.
How MLPerf measures neural network performance for generative AI
MLPerf is the leading public benchmark for measuring the performance of computer systems in training machine learning models. It covers various domains such as computer vision, natural language processing, recommendation systems, and reinforcement learning. Recently, it added two new benchmarks for generative AI: large language models (LLM) and text-to-image generation.
Both benchmarks test the ability of systems to train large and complex neural networks that can generate realistic and diverse content from data.
LLM benchmark: Training GPT-3, a large language model
The LLM benchmark tests the ability of systems to train large language models, such as GPT-3, that can generate realistic and coherent text on any topic. GPT-3 is one of the most advanced and popular examples of generative AI, powering applications such as GitHub’s coding assistant CoPilot and OpenAI’s ChatGPT.
GPT-3 is a neural network that uses a transformer architecture, which consists of multiple layers of self-attention and feed-forward units. GPT-3 has 175 billion parameters, making it the largest neural network ever trained.
However, training GPT-3 is not easy. It requires a huge amount of computational power and time. The LLM benchmark does not require systems to train GPT-3 from scratch but to reach a certain checkpoint that proves they can achieve the desired accuracy given enough time.
In the latest round of MLPerf, three companies submitted results for the LLM benchmark: Nvidia, Google, and Microsoft. All three used massive systems with thousands of GPUs to tackle the challenge.
Nvidia and Microsoft tested systems with 10,752 GPUs each, the largest ever tested by MLPerf. Nvidia’s system, called Eos, used its H100 GPUs and Quantum-2 Infiniband interconnects. Microsoft’s system, hosted on its cloud computing platform Azure, used the same GPUs and interconnects. Both systems completed the LLM benchmark in less than four minutes, with Nvidia slightly ahead of Microsoft by a few seconds.
Google tested a system with 4,096 of its own TPU v4 chips, which are specialized for machine learning. Google’s system took about 11 minutes to complete the LLM benchmark, but it also achieved a higher accuracy than Nvidia and Microsoft.
According to Nvidia, its Eos system is capable of 42.6 exaflops of peak performance, and its interconnects can transfer 1.1 petabytes per second of data. Extrapolating from its LLM result, Nvidia estimates that Eos could train GPT-3 from scratch in eight days. A smaller system with 512 H100 GPUs would take four months.
Text-to-image benchmark: Training Stable Diffusion, a text-to-image generator
The text-to-image benchmark tests the ability of systems to train models that can generate realistic images from text descriptions, such as “a cat wearing a hat”. This is another example of generative AI, with applications such as art, design, and entertainment.
The text-to-image benchmark uses a model called Stable Diffusion, which is based on a technique called diffusion probabilistic models. This technique allows models to generate images by gradually refining them from noise, rather than generating them pixel by pixel.
Stable Diffusion is a neural network that uses a convolutional architecture, which consists of multiple layers of filters that can extract features and patterns from images. Stable Diffusion has 1.3 billion parameters, making it a large and complex neural network.
Only Nvidia submitted results for the text-to-image benchmark, using its Eos system. It completed the benchmark in about 14 minutes, achieving a high-quality score for the generated images.
MLPerf results show rapid progress in neural network training for generative AI
The MLPerf results show the rapid progress and innovation in the field of generative AI, as well as the intense competition among the leading companies. The results also show the need for massive and efficient systems to train neural networks for generative AI, as they grow larger and more complex every year.
The MLPerf benchmarks are updated regularly to reflect the state-of-the-art in machine learning. The next round of MLPerf is expected to include new benchmarks for speech synthesis, video synthesis, and 3D reconstruction.
Generative AI is an exciting and challenging domain that promises to unlock new possibilities for creativity, communication, and understanding. As MLPerf shows, the race to train the best neural networks for generative AI is on.