But what is grokking? – transformers

The video explains “grokking” as a sudden emergence of deep understanding in AI models, demonstrated through a single-layer transformer learning modular arithmetic by internally representing inputs with sine and cosine waves to perform addition via trigonometric identities. It also highlights advances in mechanistic interpretability, showing how complex AI behaviors can sometimes be traced to understandable […]
FunctionGemma – Function Calling at the Edge – transformers

The video introduces Function Gemma, a new open model release from the Gemma team, designed to bring customizable function calling capabilities to small language models that can run efficiently on edge devices like mobile phones. Unlike the more research-focused T5 Gemma 2, Function Gemma is specialized for practical applications such as games or apps where […]
Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 – Recap & Current Trends – gpt-4

The final lecture of Stanford’s CM295 course provided a comprehensive recap of the entire quarter, tracing the evolution of transformers and large language models (LLMs) from their foundational concepts to current trends in 2025. The course began with an introduction to tokenization and embedding techniques, highlighting the limitations of early methods like Word2Vec and RNNs, […]
