All tags
Person: "russelljkaplan"
Microsoft Build: MAI-Thinking-1 and MAI Family models, Surface RTX Spark Dev Box, and OpenClaw in Windows
mai-thinking-1 mai-code-1-flash holo-3.1 qwen-35b sonnet-4.6 claude-code codex microsoft openrouter fal baseten hcompany_ai teksedge nous-research teknim cognition windsurf perplexity-ai mixture-of-experts context-windows benchmarking reinforcement-learning prompt-optimization agentic-ai local-inference model-family-expansion model-reporting agent-native-devices software-development model-optimization hybrid-inference desktop-agents model-quantization mustafasuleyman eliebakouch hannahajishirzi asadovsky bj2rn lateinteraction lakshyaaagrawal theturingpost kimmonismus yusuf_i_mehdi pierceboggan lukehoban nielsrogge russelljkaplan
Microsoft introduced MAI-Thinking-1, a 35B parameter MoE model with 256K context, achieving 97% on AIME 2025 and outperforming Sonnet 4.6 in human preference tests. The broader 7-model MAI family spans reasoning, code, image, speech, and voice, with third-party availability on OpenRouter, fal, and Baseten. The detailed 109-page technical report revealed insights on scaling, MFU, RL/post-training, and data curation, highlighting no third-party distillation and advanced prompt optimization techniques. Microsoft emphasized agent-native devices and local inference with projects like Project Solara / Scout and the Surface RTX Spark Dev Box, alongside software innovations such as the Copilot desktop app and MAI-Code-1-Flash integration. Meanwhile, local-first computer-use agents like Holo 3.1 (Qwen-based, 0.8B to 35B parameters) support laptops and small workstations with optimized formats and strong benchmark results. Desktop shells for agents, including Hermes Desktop, Devin Desktop, and agent-neutral approaches compatible with Devin, Claude Code, and Codex, are proliferating, with hybrid local/cloud execution becoming the default architecture as seen in Perplexity Computer's hybrid agentic inference.
not much happened today
claude-code codex composer-2.5 langchain cognition anthropic openai microsoft cursor agent-automation agent-observability ci-cd prompt-caching remote-execution verification decomposition feedback-loops coding-agents model-efficiency instruction-following krishdpi walden_yan russelljkaplan fchollet gabriberton palashshah shannholmberg
Agent infrastructure is advancing with LangSmith Engine providing CI/CD loops for agents and SmithDB enabling low-latency querying for observability. Cognition's Devin Auto-Triage offers persistent automation for bug triage with memory and subagent structures. Anthropic improves Claude Code for large codebases with prompt cache diagnostics and faster modes, while OpenAI enhances Codex workflows with remote execution and plugins. Microsoft released remote control for GitHub Copilot CLI and VS Code. The community emphasizes verification, decomposition, and feedback loops over prompt cleverness for coding agents. Cursor's Composer 2.5 is highlighted as a strong new coding model, with plans for a larger model trained with SpaceXAI using 10× more compute on Colossus 2 hardware, praised for efficiency and collaboration improvements.
Moondream 2025.1.9: Structured Text, Enhanced OCR, Gaze Detection in a 2B Model
o1 vdr-2b-multi-v1 llava-mini openai llamaindex langchainai qdrant genmoai vision model-efficiency structured-output gaze-detection reasoning model-distillation multimodality embedding-models gan diffusion-models self-attention training-optimizations development-frameworks api cross-language-deployment semantic-search agentic-document-processing developer-experience philschmid saranormous jxmnop reach_vb iscienceluvr multimodalart arohan adcock_brett awnihannun russelljkaplan ajayj_
Moondream has released a new version that advances VRAM efficiency and adds structured output and gaze detection, marking a new frontier in vision model practicality. Discussions on Twitter highlighted advancements in reasoning models like OpenAI's o1, model distillation techniques, and new multimodal embedding models such as vdr-2b-multi-v1 and LLaVA-Mini, which significantly reduce computational costs. Research on GANs and decentralized diffusion models showed improved stability and performance. Development tools like MLX and vLLM received updates for better portability and developer experience, while frameworks like LangChain and Qdrant enable intelligent data workflows. Company updates include new roles and team expansions at GenmoAI. "Efficiency tricks are all you need."