All tags
Person: "hrishioa"
not much happened today
claude-code composer-2 cursor openai anthropic langchain cognition reinforcement-learning developer-tooling agent-systems agent-runtimes security credential-management multi-agent-systems model-training benchmarking software-engineering enterprise-ai kimmonismus mntruell theo ellev3n11 amanrsanger charliermarsh gdb yuchenj_uw neilhtennek simonw yuvalinthedeep lvwerra hrishioa
Cursor launched Composer 2, a frontier-class coding model with major cost reductions and strong benchmark scores like 61.3 on CursorBench and 73.7 on SWE-bench Multilingual. The model was improved via a first continued pretraining run feeding into reinforcement learning, trained across 3–4 clusters worldwide by a ~40-person team. OpenAI acquired Astral, the team behind Python tools uv, ruff, and ty, strengthening its developer platform. Anthropic expanded Claude Code with messaging app channels for persistent developer workflows. The focus in AI agents is shifting from single agents to managed fleets and runtimes, with LangChain launching LangSmith Fleet for enterprise agent management emphasizing agent identity, credential management, and auditability. Other launches include Cognition's teams of Devins, AgentUI by lvwerra, and discussions on agent runtimes with features like checkpointing and rollback. Security and permissions are emerging as critical constraints in agent system design.
AI Engineer Code Summit
gemini-3-pro-image gemini-3 gpt-5 claude-3.7-sonnet google-deepmind togethercompute image-generation fine-tuning benchmarking agentic-ai physics model-performance instruction-following model-comparison time-horizon user-preference demishassabis omarsar0 lintool hrishioa teknium artificialanlys minyangtian1 ofirpress metr_evals scaling01
The recent AIE Code Summit showcased key developments including Google DeepMind's Gemini 3 Pro Image model, Nano Banana Pro, which features enhanced text rendering, 4K visuals, and fine-grained editing capabilities. Community feedback highlights its strong performance in design and visualization tasks, with high user preference scores. Benchmarking updates reveal the new CritPt physics frontier benchmark where Gemini 3 Pro outperforms GPT-5, though AI still lags on complex unseen research problems. Agentic task evaluations show varied time horizons and performance gaps between open-weight and closed frontier models, emphasizing ongoing challenges in AI research and deployment. "Instruction following remains jagged for some users," and model fit varies by use case, with Gemini 3 excelling in UI and code tasks but showing regressions in transcription and writing fidelity.