MLCommons Announces Expansion of Industry-Leading AILuminate Benchmark

加载中...

MLCommons 2025-05-30

World leader in AI benchmarking announces new partnership with India's NASSCOM; updated reliability grades for leading LLMs

SAN FRANCISCO, May 29, 2025 (GLOBE NEWSWIRE) -- MLCommons® today announced that it is expanding its first-of-its-kind AILuminate benchmark to measure AI reliability across new models, languages, and tools. As part of this expansion, MLCommons is partnering with NASSCOM, India's premier technology trade association, to bring AILuminate's globally recognized AI reliability benchmarks to South Asia. MLCommons is also unveiling new proof of concept testing for AILuminate's Chinese-language capabilities and new AILuminate reliability grades for an expanded suite of large language models (LLMs).

”We're looking forward to working with NASSCOM to develop India-specific, Hindi-language benchmarks and ensure companies in India and around the world can better measure the reliability and risk of their AI products,” said Peter Mattson, President of MLCommons. “This partnership, along with new AILuminate grades and proof of concept for Chinese language capabilities, represents a major step towards the development of globally inclusive industry standards for AI reliability.”

“The rapid development of AI is reshaping India's technology sector and, in order to harness risk and foster innovation, rigorous global standards can help align the growth of the industry with emerging best practices,” said Ankit Bose, Head of NASSCOM AI. “We plan to work alongside MLCommons to develop these standards and ensure that the growth and societal integration of AI technology continues responsibly.”

The NASSCOM collaboration builds on MLCommons' intentionally global approach to AI benchmarking. Modeled after MLCommons' ongoing partnership with Singapore's AI Verify Foundation, the NASSCOM partnership will help to meet South Asia's urgent need for standardized AI benchmarks that are collaboratively designed and trusted by the region's industry experts, policymakers, civil society members, and academic researchers. MLCommons' partnership with the AI Verify Foundation – in close collaboration with the National University of Singapore – has already resulted in significant progress towards globally-inclusive AI benchmarking across East Asia, including just-released proof of concept scores for Chinese-language LLMs.

AILuminate is also unveiling new reliability grades for an updated and expanded suite of LLMs, to help companies around the world better measure product risk. Like previous AILuminate testing, these grades are based on LLM responses to 24,000 test prompts across 12 hazard categories – including including violent and non-violent crimes, child sexual exploitation, hate, and suicide/self-harm. None of the LLMs evaluated were given any advance knowledge of the evaluation prompts (a common problem in non-rigorous benchmarking), nor access to the evaluator model used to assess responses. This independence provides a methodological rigor uncommon in standard academic research or private benchmarking.

“Companies are rapidly incorporating chatbots into their products, and these updated grades will help them better understand and compare risk across new and constantly-updated models,” said Rebecca Weiss, Executive Director of MLCommons.”We're grateful to our partners on the Risk and Reliability Working Group – including some of the foremost AI researchers, developers, and technical experts – for ensuring a rigorous, empirically-sound analysis that can be trusted by industry and academia like.”

Having successfully expanded the AILuminate benchmark to multiple languages, the AI Risk & Reliability Working Group is beginning the process of evaluating reliability across increasingly sophisticated AI tools, including mutli-modal LLMs and agentic AI. We hope to announce proof-of-concept benchmarks in these spaces later this year.

About MLCommons
MLCommons is the world leader in building benchmarks for AI. It is an open engineering consortium with a mission to make AI better for everyone through benchmarks and data. The foundation for MLCommons began with the MLPerf® benchmarks in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 125+ members, global technology providers, academics, and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire AI industry through benchmarks and metrics, public datasets, and measurements for AI risk and reliability.

Press Inquiries:

press@mlcommons.org

旭派钠电破局千亿市场！中国电池之乡诞生「起动电池革命」新引擎

24h快讯

旭派钠电破局千亿市场！中国电池之乡诞生「起动电池革命」新引擎

当全球能源转型的巨轮驶向深水区，"双碳" 目标如惊蛰春雷唤醒产业变革。在 "中国电池之乡" 浙江长兴，一家扎根行业三十年的 "隐形冠军"——旭派集团，正以一场颠覆性创

2025-05-30

MLCommons Announces Expansion of Industry-Leading AILuminate Benchmark

World leader in AI benchmarking announces new partnership with India's NASSCOM; updated reliability grades for leading LLMs SAN FRANCISCO, May 29, 20

2025-05-30

智汇国寿一生守护中国人寿财险推出多项客户节品牌活动打造有温度的服务生态

2025-05-30

王庆雪中国风画展在京启幕

一场彩墨华彰绘盛世画展拉开了帷幕,2025年5月16日至20日,王庆雪先生“彩墨华彰绘盛世”中国风画展在北京电影学院影视文化产业园隆重举行。本次开幕式的领导

2025-05-30

中国竹纤维亮相国际舞台绿竹茶杯作为国礼赠克罗地亚女总统

5月23日,在“纪念中国梦提出13周年筹备会暨纪念联合国成立80周年·世界领袖可持续环球行——中国白兰地走向世界交流会”上,湖北省绿竹新材料科技有限公司研发的

2025-05-30

24小时热门