All tags
Model: "gpt-5.4-mini"
not much happened today
gemini-3.1-flash voxtral-tts cohere-transcribe gpt-5.4-mini gpt-5.4-nano glm-5-turbo reka-edge reka-flash-3 google-deepmind mistral-ai cohere openai zai reka-ai voice vision function-calling context-windows multimodality text-to-speech low-latency human-preference automatic-speech-recognition model-benchmarking cost-efficiency hallucination-detection multi-agent-systems open-source git-worktrees logan_kilpatrick sundar_pichai guillaume_lample aidan_gomez jay_alammar giffmana andrew_curran
Google launched Gemini 3.1 Flash Live, a realtime voice and vision agent model with 2x longer conversation memory, supporting 70 languages and 128k context. Mistral AI released Voxtral TTS, a low-latency, open-weight text-to-speech model supporting 9 languages and competitive with ElevenLabs. Cohere introduced Cohere Transcribe, an audio model with 14-language support and top English ASR leaderboard performance at 5.42 WER. OpenAI released smaller multimodal variants GPT-5.4 mini and GPT-5.4 nano with 400k context, noted for cost-competitiveness but high verbosity and hallucination rates. Other releases include GLM-5-Turbo by Zai, Reka Edge and Flash 3 on OpenRouter, and new multi-agent UX tooling Cline Kanban for orchestrating CLI coding agents.
MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model
minimax-m2.7 sonnet-4.6 glm-5 mimo-v2-pro mamba-3 qwen-3.5 kimi-k2.5 gpt-5.4-mini minimax xiaomi artificial-analysis ollama trae yupp openrouter vercel zo opencode kilocode cartesia self-evolving-agents reasoning cost-efficiency token-efficiency hybrid-architecture harness-engineering agent-harnesses skills memory-optimization architecture feedback-loops api inference execution-environment
MiniMax M2.7 is the headline model release, described as a "self-evolving agent" with strong performance metrics including 56.22% on SWE-Pro, 57.0% on Terminal Bench 2, and parity with Sonnet 4.6. It features recursive self-improvement in skills, memory, and architecture. Artificial Analysis places M2.7 on the cost/performance frontier with an Intelligence Index score of 50, matching GLM-5 (Reasoning) but at a fraction of the cost. Distribution is available via platforms like Ollama cloud and OpenRouter. Xiaomi’s MiMo-V2-Pro is noted as a serious Chinese API-only reasoning model with a score of 49 on the Intelligence Index and favorable token efficiency. Cartesia’s Mamba-3 is highlighted as an SSM optimized for inference-heavy use, with early reactions focusing on hybrid transformer architectures like Qwen3.5 and Kimi Linear. The report emphasizes a shift from prompting to harness engineering, where the execution environment and agent harnesses, including skills and MCP, are becoming key differentiators in AI system design. This includes discussions on tools, repo legibility, constraints, and feedback loops, with mentions of DSPy and GPT-5.4 mini as important components in this evolving landscape.
not much happened today
gpt-5.4-mini gpt-5.4-nano gpt-5.4 codex openai langchain stripe ramp coinbase nous-research hermes-agent coding multimodality subagents context-window model-performance pricing behavior-tuning secure-execution plugin-architecture attention-mechanisms agent-infrastructure hwchase17 michpokrass
OpenAI released GPT-5.4 mini and GPT-5.4 nano, their most capable small models optimized for coding, multimodal understanding, and subagents, featuring a 400k context window and over 2x speed compared to GPT-5 mini. The mini model approaches larger GPT-5.4 performance while using only 30% of Codex quota, becoming the default for many coding workflows. Pricing concerns and truthfulness tradeoffs were noted, with mixed third-party evaluations on reasoning and resistance to false premises. OpenAI also addressed behavior tuning issues in a recent update. Meanwhile, agent infrastructure is evolving with secure code execution and orchestration tools like LangChain's LangSmith Sandboxes and Open SWE, inspired by internal systems at Stripe, Ramp, and Coinbase. Subagents and secure execution are now key product features, with releases like Hermes Agent v0.3.0 showcasing plugin architectures, live Chrome control, and voice mode. Research on attention mechanisms, including Attention Residuals and vertical attention, is gaining traction.