All tags
Person: "johnschulman2"
not much happened today
claude-opus-4.8 gpt-5.5 qwen kimi deepseek anthropic huggingface langchain vllm_project reinforcement-learning tokenization agentic-ai api model-optimization long-context rust performance-optimization multi-agent-systems prompt-engineering jeremyphoward leo_linsky clementdelangue johnschulman2 omarsar0 hwchase17 ofirpress scaling01
Anthropic rolled out Claude Opus 4.8, which shows incremental improvements but mixed benchmark results, including better cooperation and coding behavior but some regressions in document parsing. Platform updates include mid-conversation system instructions enhancing long agent sessions, though API pricing remains a concern. A Hugging Face analysis revealed a critical bug in multi-turn reinforcement learning training loops involving tokenization mismatches, with a proposed "Token-In, Token-Out" fix. Agent harness design is evolving as a key optimization area, with LangChain's Deep Agents v0.6 achieving strong performance at much lower cost, and vllm_project releasing native weight syncing APIs and a Rust BPE tokenizer to improve tokenization efficiency. Debate continues on the value of multi-agent systems, with some seeing them as speedups and others expecting capability breakthroughs.
not much happened today
gpt-5.5 codex thinking-machines openai anthropic multimodality real-time-interaction visual-proactivity deployment cybersecurity threat-modeling automation continuous-audio-video-text-processing security-models field-engineering enterprise-ai johnschulman2 soumithchintala chillee liliyu_lili rown kimmonismus giffmana swyx eliebakouch gdb sama therundownai lukolejnik matvelloso
Thinking Machines previewed their new native interaction models designed for full-duplex multimodal interaction enabling real-time concurrent listening, speaking, watching, thinking, searching, and reacting, marking a shift beyond turn-based AI. This approach emphasizes continuous audio, video, and text processing, with innovations like visual proactivity and background tool use, implemented using SGLang. Meanwhile, OpenAI announced the OpenAI Deployment Company, a new unit with 150 Forward Deployed Engineers and $4B initial investment to help enterprises deploy frontier models, signaling a move into the deployment layer of the AI economy. OpenAI also launched Daybreak, a security-focused initiative integrating GPT-5.5 and Codex for cyber defense, threat modeling, and automated patching, offering differentiated access tiers including GPT-5.5-Cyber. This contrasts with Anthropic's more restrictive cyber approach, highlighting tensions in AI security strategies.
not much happened today
claude-4 claude-4-opus claude-4-sonnet gemini-2.5-pro gemma-3n imagen-4-ultra anthropic google-deepmind openai codebase-understanding coding agentic-performance multimodality text-to-speech video-generation model-integration benchmarking memory-optimization cline amanrsanger ryanpgreenblatt johnschulman2 alexalbert__ nearcyan mickeyxfriedman jeremyphoward gneubig teortaxesTex scaling01 artificialanlys philschmid
Anthropic's Claude 4 models (Opus 4, Sonnet 4) demonstrate strong coding abilities, with Sonnet 4 achieving 72.7% on SWE-bench and Opus 4 at 72.5%. Claude Sonnet 4 excels in codebase understanding and is considered SOTA on large codebases. Criticism arose over Anthropic's handling of ASL-3 security requirements. Demand for Claude 4 is high, with integration into IDEs and support from Cherry Studio and FastHTML. Google DeepMind introduced Gemini 2.5 Pro Deep Think and Gemma 3n, a mobile multimodal model reducing RAM usage by nearly 3x. Google's Imagen 4 Ultra ranks third in the Artificial Analysis Image Arena, available on Vertex AI Studio. Google also promoted Google Beam, an AI video model for immersive 3D experiences, and new text-to-speech models with multi-speaker support. The GAIA benchmark shows Claude 4 Opus and Sonnet leading in agentic performance.