Frozen AI News archive

QwQ-32B claims to match DeepSeek R1-671B

**Alibaba Qwen** released their **QwQ-32B** model, a **32 billion parameter** reasoning model using a novel two-stage reinforcement learning approach: first scaling RL for math and coding tasks with accuracy verifiers and code execution servers, then applying RL for general capabilities like instruction following and alignment. Meanwhile, **OpenAI** rolled out **GPT-4.5** to Plus users, with mixed feedback on coding performance and noted inference cost improvements. The QwQ model aims to compete with larger MoE models like **DeepSeek-R1**. *"GPT-4.5 is unusable for coding"* was a notable user critique, while others praised its reasoning improvements due to scaling pretraining.

Canonical issue URL

AI News for 3/5/2025-3/6/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (227 channels, and 3619 messages) for you. Estimated reading time saved (at 200wpm): 351 minutes. You can now tag @smol_ai for AINews discussions!

As previewed last November and again last month, the Alibaba Qwen team is finally out with their final version of QwQ, their Qwen2.5-Plus + Thinking (QwQ) post train boasting numbers comparable to R1 which is an MoE as much as 20x larger.

image.png

It's still early so no independent checks available yet, but the Qwen team have done the bare essentials to reassure us that they have not simply overfit to benchmarks in order to get this result - in that they boast decent non-math/coding benchmark numbers still, and gave us one paragraph on how:

  • In the initial stage, we scale RL specifically for math and coding tasks. Rather than relying on traditional reward models, we utilized an accuracy verifier for math problems to ensure the correctness of final solutions and a code execution server to assess whether the generated codes successfully pass predefined test cases. As training episodes progress, performance in both domains shows continuous improvement.
  • After the first stage, we add another stage of RL for general capabilities. It is trained with rewards from general reward model and some rule-based verifiers. We find that this stage of RL training with a small amount of steps can increase the performance of other general capabilities, such as instruction following, alignment with human preference, and agent performance, without significant performance drop in math and coding.

More information - a paper, sample data, sample code - could help understand, but this is fair enough for a 2025 open model disclosure. It will take a while more for QwQ-32B to rank on the Open LLM Leaderboard but here is where things currently stand, as a reminder that thinking posttrains aren't strictly better than their instruct predecessor.

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

AI Model Releases and Benchmarks

Open Source AI & Community

AI Applications & Use Cases

AI Infrastructure & Compute

AI Safety & Policy

Memes & Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Apple's Mac Studio with M3 Ultra for AI-Inference and 512GB Unified Memory

Theme 2. Qwen/QwQ-32B Launch: Performance Comparisons and Benchmarks

Theme 3. llama.cpp's Versatility in Leveraging Local LLMs

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. TeaCache Enhancement Boosts WAN 2.1 Performance

Theme 2. Lightricks LTX-Video v0.9.5 Adds Keyframes and Extensions

Theme 3. Open-Source Development of Chroma Model Released

Theme 4. GPT-4.5 Rolls Out to Plus Users with Memory Capabilities


AI Discord Recap

A summary of Summaries of Summaries by o1-preview-2024-09-12

Theme 1: Alibaba's QwQ-32B Challenges the Titans

Theme 2: User Frustrations Boil Over AI Tool Shortcomings

Theme 3: AI Agents Aim High with Sky-High Price Tags

Theme 4: Reinforcement Learning Plays and Wins Big Time

Theme 5: Techies React to New Hardware Unveilings


PART 1: High level Discord summaries

Cursor IDE Discord


OpenAI Discord


Codeium (Windsurf) Discord


aider (Paul Gauthier) Discord


LM Studio Discord


Interconnects (Nathan Lambert) Discord


GPU MODE Discord


Perplexity AI Discord


HuggingFace Discord


MCP (Glama) Discord


Latent Space Discord


Stability.ai (Stable Diffusion) Discord


LlamaIndex Discord


Notebook LM Discord


Nous Research AI Discord


OpenRouter (Alex Atallah) Discord


Yannick Kilcher Discord


Cohere Discord


Modular (Mojo 🔥) Discord


DSPy Discord


tinygrad (George Hotz) Discord


Eleuther Discord


Torchtune Discord


LLM Agents (Berkeley MOOC) Discord


Gorilla LLM (Berkeley Function Calling) Discord


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Cursor IDE ▷ #general (676 messages🔥🔥🔥):

Cursor 3.7 Debacles, Dumbness Meter, YOLO mode configuration, Windsurf versus Cursor, OpenAI new pricing

Links mentioned:


OpenAI ▷ #annnouncements (2 messages):

GPT-4.5, AGI Development, AI Safety, AI Alignment


OpenAI ▷ #ai-discussions (546 messages🔥🔥🔥):

Will O3 ever come out?, Custom LLM for graduation project, Grok vs ChatGPT Plus, Abacus AI Feedback, Claude secret update

Links mentioned:


OpenAI ▷ #gpt-4-discussions (10 messages🔥):

GPT-4.5 refusing prompts, GPT-4.5 message limits


OpenAI ▷ #prompt-engineering (12 messages🔥):

Prompt Engineering Survey, Ontology of Prompt Strategies, Sora and AI Videos, Character Consistency in Sora, Hyper-realistic Visuals


OpenAI ▷ #api-discussions (12 messages🔥):

Systematic Survey of Prompt Engineering in Large Language Models, Ontology of Prompt Strategies, Sora, Character Consistency in AI Videos, Hyper-realistic Visuals in Sora


Codeium (Windsurf) ▷ #announcements (1 messages):

Windsurf Wave 4 Release, Cascade Previews, Tab-to-import, Linter integration, Claude 3.7 improvements

Links mentioned:


Codeium (Windsurf) ▷ #discussion (6 messages):

vscode commit message, flutterflow, uninstalling codium extension


Codeium (Windsurf) ▷ #windsurf (397 messages🔥🔥):

Windsurf performance degradation, Codeium login issues, Windsurf Wave 4, Credit usage, Feature requests

Links mentioned:


aider (Paul Gauthier) ▷ #general (201 messages🔥🔥):

Grok 3 Model Comparison, Aider Offline Installation, Qwen's New QwQ-32B Reasoning Model, OpenAI's o3 Mini Access, Parasail's R1 Performance on OpenRouter

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (53 messages🔥):

OWUI Integration, LM Studio R1, Aider Output, OpenRouter API, Commit Messages

Links mentioned:


LM Studio ▷ #general (84 messages🔥🔥):

VRAM overflow, LMStudio and Phi-4 audio modality support, KV cache impact on VRAM, New Mac Studio's RAM, Sesame AI's open-source TTS model

Links mentioned:


LM Studio ▷ #hardware-discussion (134 messages🔥🔥):

M3 Ultra vs M4 Max, AMD RX 9070 XT GPU, DeepSeek R1, SGI machines, Local LLMs

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (114 messages🔥🔥):

Richard Sutton, OpenAI agents pricing, QwQ-32B model, Boston Dynamics vs Unitree, Adversarial machine learning

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (18 messages🔥):

LLMs playing Diplomacy, GPT-4.5 greentext autocompleter, Mafia game playing LLMs, Post training as a service startups

Links mentioned:


Interconnects (Nathan Lambert) ▷ #rl (2 messages):

Schmidhuber Congratulates Sutton and Barto, Turing Award, Cult leader game

Link mentioned: Tweet from Jürgen Schmidhuber (@SchmidhuberAI): Congratulations to @RichardSSutton and Andy Barto on their Turing award!


Interconnects (Nathan Lambert) ▷ #reads (3 messages):

Reinforcement Learning beats Pokemon, DeepSeek MLA performance challenges, ThunderMLA fused megakernel

Links mentioned:


Interconnects (Nathan Lambert) ▷ #lectures-and-projects (10 messages🔥):

RLHF Book, Lecture Series


Interconnects (Nathan Lambert) ▷ #posts (9 messages🔥):

Stargate Project, Data protection, OpenAI coding agent


GPU MODE ▷ #general (48 messages🔥):

Touhou-trained model, Unified Memory Discussion, Thunderbolt 5 benefits, Raspberry Pi clusters

Link mentioned: Get Turing Pi 2, mini ITX cluster board: The Turing Pi 2.5 is a 4-node mini ITX cluster board with a built-in Ethernet switch that runs Turing RK1, Raspberry Pi CM4 or Nvidia Jetson compute modules


GPU MODE ▷ #triton (4 messages):

Triton gather operation, PagedAttention in Triton

Link mentioned: Cannot call tl.gather · Issue #5826 · triton-lang/triton: Describe the bug When I run the following code I get an exception: AttributeError: module 'triton.language' has no attribute 'gather' import triton.language as tl tl.gather I've in...


GPU MODE ▷ #cuda (13 messages🔥):

Compiler Optimization, CUDA OpenGL Interop, cudaGraphicsGLRegisterImage fails


GPU MODE ▷ #torch (4 messages):

Torch C++ Interface Library, Extending OffloadPolicy, use_reentrant in Checkpoint

Link mentioned: torch.utils.checkpoint — PyTorch 2.6 documentation: no description found


GPU MODE ▷ #off-topic (12 messages🔥):

SSH Pain Points, Nitrokey, SoloKey, Yubikey, PC under the sink


GPU MODE ▷ #irl-meetup (3 messages):

Tenstorrent, LlamaIndex, Koyeb, AI Infrastructure, Next-Gen Hardware

Link mentioned: Next-Gen AI Infra with Tenstorrent & Koyeb @LlamaIndex · Luma: Join us for a special evening as we kick off a groundbreaking collaboration between Tenstorrent and Koyeb with our friends from LlamaIndex.This meetup is a…


GPU MODE ▷ #triton-puzzles (1 messages):

Reshaping vs Permuting, Matrix Transformations


GPU MODE ▷ #rocm (6 messages):

RGP on ROCm, ATT plugin


GPU MODE ▷ #tilelang (17 messages🔥):

Shared Memory Allocation in CUDA, Python Linting Workarounds, CUDA Compatibility Issues, TileLang CUDA 12.4/12.6 Bug, WeChat Group Invitation

Link mentioned: Mismatched elements when performing matmul on CUDA 12.4/12.6 · Issue #149 · tile-ai/tilelang: Describe the Bug I ran the simple matmul code below, and I got error AssertionError: Tensor-likes are not close! The code works fine on CUDA 12.1, but not on CUDA 12.4/12.6. The number of mismatche...


GPU MODE ▷ #metal (1 messages):

M3 Ultra, Unified Memory


GPU MODE ▷ #reasoning-gym (12 messages🔥):

ARC AGI, Lossless Information Compression, QwQ-32B, RL Scaling

Links mentioned:


GPU MODE ▷ #gpu模式 (1 messages):

leoneo221: 好久没上线,竟然多了一个中文channel


GPU MODE ▷ #submissions (11 messages🔥):

Modal Runners, Leaderboard Submissions, GPU usage


Perplexity AI ▷ #announcements (1 messages):

AI model settings, Claude 3.7 Sonnet, Auto settings improvements


Perplexity AI ▷ #general (107 messages🔥🔥):

Perplexity Auto Model Selection, Image sources issue, Deepseek r2 release, Claude Sonnet 3.7 frustrations, Google Search AI Mode

Links mentioned:


Perplexity AI ▷ #sharing (6 messages):

Microsoft AI Health Assistant, Python Learning Roadmap, Mac M3, OpenAI Agent, SQLI Protection


Perplexity AI ▷ #pplx-api (4 messages):

API focus setting, Sonar Pro model issues, Search cost pricing


HuggingFace ▷ #general (47 messages🔥):

Local Model Usage, Llama 3.1, Mistral small instruct quantized, CoreWeave IPO, HF Inference Credits

Links mentioned:


HuggingFace ▷ #today-im-learning (6 messages):

Kornia Rust Library, Google Summer of Code 2025, Internship postings

Link mentioned: Google Summer of Code: Google Summer of Code is a global program focused on bringing more developers into open source software development.


HuggingFace ▷ #cool-finds (1 messages):

Flash Attention, Triton, CUDA, GPU Mode

Link mentioned: Tweet from Umar Jamil (@hkproj): I'll be hosted March 8th by @GPU_MODE sharing my journey in learning Flash Attention, Triton and CUDA. It's going to be an intimate conversation with the audience about my own difficulties alo...


HuggingFace ▷ #i-made-this (3 messages):

VisionKit, Deepseek-r1, Model Context Protocol (MCP)

Link mentioned: Model Context Protocol- Custom MCP Server: In this article, we will focus on building a custom MCP server. If you need an introduction to MCP, please refer to my previous articles on…


HuggingFace ▷ #computer-vision (1 messages):

DINOv2, fine-tuning, pose estimation, weakly labeled images


HuggingFace ▷ #smol-course (3 messages):

Reasoning Course, Smol Course Discovery


HuggingFace ▷ #agents-course (51 messages🔥):

Certificate location, Alfred Examples Opinion, 401 Error, Huggingface channels, Llama Index error

Links mentioned:


MCP (Glama) ▷ #general (73 messages🔥🔥):

Tool calling, MCP for Reddit, Composio MCP Support, WebMCP, fastmcp

Links mentioned:


MCP (Glama) ▷ #showcase (23 messages🔥):

MCP Server setup, MCP Token Generation, Blue Yeti Mic, Instagram Lead Scraper


Latent Space ▷ #ai-general-chat (60 messages🔥🔥):

Claude costs, M4 Macbook Air, Qwen models, React for LLM backends, Windsurf Cascade

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (47 messages🔥):

SDXL Hand Fixing, Photo Realistic Upscalers, Text-to-video for Free, Stable Diffusion v4, SD3.5 Large TurboX

Links mentioned:


LlamaIndex ▷ #blog (1 messages):

Agentic Document Workflows, DeepLearningAI partnership


LlamaIndex ▷ #general (43 messages🔥):

ImageBlock and OpenAI integration issues, Query Fusion Retriever Citation Problems, Distributed AgentWorkflow Architecture, Profiling/Timing Agent Execution in LlamaIndex, Memory Consumption with Flask and Gunicorn

Links mentioned:


Notebook LM ▷ #use-cases (13 messages🔥):

Uploading textbooks, NotebookLM API, NotebookLM and PDFs, Strategy optimization in online game, NotebookLM Podcast Feature


Notebook LM ▷ #general (29 messages🔥):

Standalone Android App, NLM Response Length, Formula Rendering in NLM, File Upload Issues, Podcast Generator

Link mentioned: no title found: no description found


Nous Research AI ▷ #general (33 messages🔥):

Gaslight Benchmark, GPT-4.5 vs Claude image generation, Video AI Prompt Engineering, Hermes Special Tokens, Post-training RL


Nous Research AI ▷ #ask-about-llms (1 messages):

garry_plahilsin07: Opps


Nous Research AI ▷ #interesting-links (4 messages):

QwQ-32B, Reinforcement Learning, DeepSeek R1, Tool calling syntax, Hermes format

Link mentioned: QwQ-32B: Embracing the Power of Reinforcement Learning: QWEN CHAT Hugging Face ModelScope DEMO DISCORDScaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studi...


OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Android Chat App, OpenRouter Integration, Speech-to-Text, Text-to-Image, Text-to-Speech

Link mentioned: Release Releasing v0.1.0-rc.0 · Ayuilos/Taiga: It's a pre-release version.And everything will have possibility to change.No more words to say, enjoy and let me know if there's bug or something!


OpenRouter (Alex Atallah) ▷ #general (32 messages🔥):

Prefill Usage in Text Completion, OpenRouter Documentation for Coding Agents, DeepSeek instruct format, LLMGuard Integration, Usage Based Charging App

Links mentioned:


Yannick Kilcher ▷ #general (13 messages🔥):

Bilevel Optimization, Sparsemax Generalization, DDP Garbled Weights, MPEC translation, AI method complexity


Yannick Kilcher ▷ #paper-discussion (5 messages):

Proactive T2I Agents, DeepMind's Papers

Links mentioned:


Yannick Kilcher ▷ #ml-news (2 messages):

QwQ-32B release, RL scaling, Qwen2.5-32B

Links mentioned:


Cohere ▷ #「💬」general (14 messages🔥):

Cohere, Enterprise, Support


Cohere ▷ #【📣】announcements (1 messages):

Aya Vision, Multilingual Vision Model, AyaVisionBenchmark, Multimodal AI

Links mentioned:


Cohere ▷ #「🔌」api-discussions (1 messages):

Cohere Reranker v3.5 latency


Cohere ▷ #「🤝」introductions (2 messages):

Introductions


Modular (Mojo 🔥) ▷ #general (10 messages🔥):

Mojo Stability, Virtual Event Recording, Triton vs Mojo, Mojo and Python Relationship


Modular (Mojo 🔥) ▷ #mojo (5 messages):

Mojo and Python performance benchmark, Mojo/Python project folder structure, Python.add_to_path alternatives, Symlink alternatives in tests folder, Modular Forum

Link mentioned: Mojo/Python project folder structure: I originally posted this on Discord (link), but @DarinSimmons felt it would make a good topic for this forum. I’m looking for guidance on folder organization for a significant Mojo/Python project. I’...


DSPy ▷ #show-and-tell (2 messages):

SynaLinks release, Keras vs Pytorch frameworks, Knowledge graph RAGs, Reinforcement learning, Cognitive architectures

Links mentioned:


DSPy ▷ #general (11 messages🔥):

Optimizing intent classification with DSPy, Comparing texts for contradictions, DSPy adapters system, Straggler threads in dspy.Evaluate and dspy.Parallel


tinygrad (George Hotz) ▷ #general (6 messages):

Lean proof for ShapeTracker merging, Taobao 4090, gfx10 trace issue, Rust CubeCL

Links mentioned:


Eleuther ▷ #general (2 messages):

Introduction of Suleiman, Introduction of Naveen, CVPR25 Paper


Eleuther ▷ #research (2 messages):

Observation 3.1, ARC Training, Lossless Compression, Intelligent Behavior

Link mentioned: ARC-AGI Without Pretraining: no description found


Eleuther ▷ #lm-thunderdome (2 messages):

arc_challenge.yaml, ARC-Challenge tasks, Few-shot Learning


Torchtune ▷ #general (5 messages):

Tokenizer customization, Checkpointer save method, special_tokens.json handling, Copy files logic

Links mentioned:


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (4 messages):

MOOC Lectures, Certificate Submission


Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (2 messages):

AST Metric, V1 Dataset



{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}