Are AI Benchmarks Telling The Full Story? [SPONSORED] – benchmarking

The video discusses the limitations of current AI benchmarks and the importance of incorporating human-centered evaluations to better understand how AI models perform in real-world scenarios. The speakers compare AI models to Formula 1 cars, which are engineering marvels but impractical for daily use, suggesting that models excelling in technical benchmarks like MMLU (Humanity’s Last […]

GPT-5.2 is dumb (I’m tired of benchmarks) – benchmarking

The video discusses the recent release of GPT-5.2, highlighting both its impressive benchmark performance and its notable shortcomings. The creator points out some bizarre errors made by the model, such as incorrectly counting letters in words and making illogical financial comparisons. Despite these issues, the model excels in traditional benchmarks, especially in high-level research tasks […]