MLCommons, an artificial intelligence benchmarking group, unveiled new tests and results aimed at evaluating the speed of cutting-edge hardware in running AI applications and handling user interactions. These benchmarks provide insights into how swiftly AI chips and systems can process data-packed AI models, gauging their responsiveness in delivering outputs to user queries, such as those encountered in applications like ChatGPT.
Among the additions is the Llama 2 benchmark, designed to measure the efficiency of large language models in question-and-answer scenarios. Developed by Meta Platforms, Llama 2 boasts 70 billion parameters and assesses the speed at which AI systems can generate responses. Additionally, MLCommons introduced another benchmark, MLPerf, which evaluates text-to-image generation. The latest addition, based on Stability AI’s Stable Diffusion XL model, expands the suite of benchmarking tools available.
Nvidia's H100 chips, utilized in servers by industry giants like Google and Nvidia themselves, emerged as dominant performers in both benchmarks for raw performance. However, server designs incorporating Qualcomm AI chips, highlighted by Krai, offer promising alternatives with significantly lower power consumption. As AI chips continue to evolve, balancing raw performance with energy efficiency remains a critical challenge for companies in the AI sector, prompting MLCommons to incorporate separate benchmarks for measuring power consumption alongside performance metrics.