All tags
Topic: "agentic-engineering"
Anthropic accuses DeepSeek, Moonshot, and MiniMax of "industrial-scale distillation attacks".
claude claude-3 codex claude-code anthropic deepseek moonshot-ai minimax openai ollama api-abuse-resistance model-security agentic-engineering coding-agents model-distillation workflow-automation sandboxing realtime-communication simon_willison
Anthropic alleges industrial-scale distillation attacks on its Claude model by DeepSeek, Moonshot AI, and MiniMax, involving ~24,000 fraudulent accounts and >16M Claude exchanges to extract capabilities, raising concerns about competitive risks and safety. The community debates the difference between scraping and API-output extraction, highlighting a shift toward protecting models via API abuse resistance techniques. Meanwhile, coding agents like Codex and Claude Code see real adoption and failures, with emerging best practices in "agentic engineering" led by Simon Willison. The OpenClaw ecosystem expands with alternatives like NanoClaw and integrations such as Ollama 0.17 simplifying open model usage.
not much happened today
claude-4.6 claude-opus-4.6 claude-sonnet-4.6 qwen-3.5 qwen3.5-397b-a17b glm-5 gemini-3.1-pro minimax-m2.5 anthropic alibaba scaling01 arena artificial-analysis benchmarking token-efficiency ai-agent-autonomy reinforcement-learning asynchronous-learning model-performance open-weights reasoning software-engineering agentic-engineering eshear theo omarsar0 grad62304977 scaling01
Anthropic released Claude Opus/Sonnet 4.6, showing a significant intelligence index jump but with increased token usage and cost. Anthropic also shared insights on AI agent autonomy, highlighting human-in-the-loop prevalence and software engineering tool calls. Alibaba launched Qwen 3.5 with discussions on reasoning efficiency and token bloat, plus open-sourced Qwen3.5-397B-A17B FP8 weights. The GLM-5 technical report introduced asynchronous agent reinforcement learning and compute-efficient techniques. Rumors about Gemini 3.1 Pro suggest longer reasoning capabilities, while MiniMax M2.5 appeared on community leaderboards. The community debates benchmark reliability and model performance nuances.