All tags
Model: "mai-code-1-flash"
Microsoft Build: MAI-Thinking-1 and MAI Family models, Surface RTX Spark Dev Box, and OpenClaw in Windows
mai-thinking-1 mai-code-1-flash holo-3.1 qwen-35b sonnet-4.6 claude-code codex microsoft openrouter fal baseten hcompany_ai teksedge nous-research teknim cognition windsurf perplexity-ai mixture-of-experts context-windows benchmarking reinforcement-learning prompt-optimization agentic-ai local-inference model-family-expansion model-reporting agent-native-devices software-development model-optimization hybrid-inference desktop-agents model-quantization mustafasuleyman eliebakouch hannahajishirzi asadovsky bj2rn lateinteraction lakshyaaagrawal theturingpost kimmonismus yusuf_i_mehdi pierceboggan lukehoban nielsrogge russelljkaplan
Microsoft introduced MAI-Thinking-1, a 35B parameter MoE model with 256K context, achieving 97% on AIME 2025 and outperforming Sonnet 4.6 in human preference tests. The broader 7-model MAI family spans reasoning, code, image, speech, and voice, with third-party availability on OpenRouter, fal, and Baseten. The detailed 109-page technical report revealed insights on scaling, MFU, RL/post-training, and data curation, highlighting no third-party distillation and advanced prompt optimization techniques. Microsoft emphasized agent-native devices and local inference with projects like Project Solara / Scout and the Surface RTX Spark Dev Box, alongside software innovations such as the Copilot desktop app and MAI-Code-1-Flash integration. Meanwhile, local-first computer-use agents like Holo 3.1 (Qwen-based, 0.8B to 35B parameters) support laptops and small workstations with optimized formats and strong benchmark results. Desktop shells for agents, including Hermes Desktop, Devin Desktop, and agent-neutral approaches compatible with Devin, Claude Code, and Codex, are proliferating, with hybrid local/cloud execution becoming the default architecture as seen in Perplexity Computer's hybrid agentic inference.
not much happened today
mai-thinking-1 mai-image-2.5 mai-code-1-flash gemma-4-12b microsoft google vllm-project ollama llama-cpp model-training reinforcement-learning model-architecture multimodality model-deployment model-efficiency fine-tuning on-device-ai eliebakouch nrehiew_ mustafasuleyman minjiyoon90 lateinteraction harold_matmul googlegemma googleaidevs mtschannen armandjoulin osanseviero
Microsoft released the detailed technical report for MAI-Thinking-1, a generalist reasoning model trained without third-party distillation, achieving 97% on AIME 2025 and outperforming Sonnet 4.6 in human preference tests. The report was praised for transparency, revealing no synthetic data use, a unique scaling ladder recipe, and detailed training data composition including 50% code and 17.5% STEM. Microsoft also introduced Frontier Tuning for workflow-specific model adaptation, claiming efficiency gains up to 10× and GPT-5.4-level quality in Excel tasks, alongside new models like MAI-Image-2.5 and MAI-Code-1-Flash. Meanwhile, Google launched Gemma 4 12B, an Apache 2.0 multimodal model with an innovative encoder-free architecture designed for on-device use with 16GB VRAM, collapsing vision and audio encoders into the LLM backbone, receiving positive community feedback and immediate tooling support.