GPT-5.2 is dumb (I’m tired of benchmarks) – benchmarking

The video discusses the recent release of GPT-5.2, highlighting both its impressive benchmark performance and its notable shortcomings. The creator points out some bizarre errors made by the model, such as incorrectly counting letters in words and making illogical financial comparisons. Despite these issues, the model excels in traditional benchmarks, especially in high-level research tasks […]