All tags
Person: "mntruell"
not much happened today
claude-code composer-2 cursor openai anthropic langchain cognition reinforcement-learning developer-tooling agent-systems agent-runtimes security credential-management multi-agent-systems model-training benchmarking software-engineering enterprise-ai kimmonismus mntruell theo ellev3n11 amanrsanger charliermarsh gdb yuchenj_uw neilhtennek simonw yuvalinthedeep lvwerra hrishioa
Cursor launched Composer 2, a frontier-class coding model with major cost reductions and strong benchmark scores like 61.3 on CursorBench and 73.7 on SWE-bench Multilingual. The model was improved via a first continued pretraining run feeding into reinforcement learning, trained across 3–4 clusters worldwide by a ~40-person team. OpenAI acquired Astral, the team behind Python tools uv, ruff, and ty, strengthening its developer platform. Anthropic expanded Claude Code with messaging app channels for persistent developer workflows. The focus in AI agents is shifting from single agents to managed fleets and runtimes, with LangChain launching LangSmith Fleet for enterprise agent management emphasizing agent identity, credential management, and auditability. Other launches include Cognition's teams of Devins, AgentUI by lvwerra, and discussions on agent runtimes with features like checkpointing and rollback. Security and permissions are emerging as critical constraints in agent system design.
not much happened today.
gpt-5.2-codex glm-4.7 openai cursor github cerebras modal artificial-analysis vllm long-running-tasks autonomous-agents code-generation inference-speed latency batch-inference gpu-scaling model-evaluation agent-systems operational-scaling swyx kevinweil pierceboggan mntruell scaling01
OpenAI launched GPT-5.2-Codex API, touted as their strongest coding model for long-running tasks and cybersecurity. Cursor integrated GPT-5.2-Codex to autonomously run a browser for a week, producing over 3 million lines of Rust code. GitHub incorporated it into their code tools, easing enterprise adoption. Discussions highlight the importance of review loops in agent systems and debate evaluation metrics for coding models. OpenAI partnered with Cerebras to improve inference speed and latency, with Cerebras serving GLM-4.7 at 1,445 tokens/sec and low latency. Provider benchmarking reveals tradeoffs in throughput, latency, and context window sizes. Modal shared operational scaling insights for self-hosted inference fleets of 20k GPUs, focusing on batch inference optimization with vLLM and FlashInfer backend. This reflects a focus on inference infrastructure, long-horizon autonomous agents, and coding model evaluation.