Topic: "agent-engineering"

eagle-3.1 unigram-tokenizer qwen-3.5 deepseek-v4-pro mimo deep-agents-v0.6 397b-parameter-model eaglecorp vllm_project perplexity_ai alibaba lightseek nvidia mooncake flashattention kimmonismus deepseek xiaomi langchain baseten trajectory clay harvey decagon mercor rogo rlm inference-optimization long-context speculative-decoding tokenization attention-mechanisms kv-cache cache-hierarchy agent-engineering model-harness-memory-fit continual-learning quantization autoscaling memory-centric-agents evaluation-automation kimmonismus _luofuli vtrivedy10

Inference optimization is increasingly architectural, with EAGLE 3.1 improving speculative decoding and long-context handling, collaborating with vLLM and TorchSpec. Perplexity open-sourced a rebuilt Unigram tokenizer cutting CPU use by 5–6× and achieving 63 µs at 514 tokens. Qwen3.5 hits 580 tokens/s via joint efforts from Alibaba, LightSeek, NVIDIA, Mooncake, and FlashAttention-4 contributors. Price cuts in APIs from Chinese labs are sustainable due to structural KV-cache and attention improvements, exemplified by DeepSeek V4-Pro and Xiaomi MiMo reducing caching costs significantly. Agent engineering shifts focus from model quality to model-harness-memory fit, with LangChain releasing Deep Agents v0.6 and tools like LangSmith Engine automating evaluation loops. Trajectory launched a continual learning platform with $15M funding and partners like Clay and Harvey, supporting large models including a 397B-parameter model deployed on autoscaled H100 infrastructure. Open-source memory-centric agents and minimal training harnesses also gained attention.

Jan 28

not much happened today

gpt-5.2 claude-opus-4.5 kimi-k2.5 openai anthropic deeplearningai langchain apple agentic-ai multimodality coding self-verification agent-engineering model-benchmarking model-optimization workflow-automation

AI News for 1/27/2026-1/28/2026 highlights a quiet day with deep dives into frontier model "personality split" where GPT-5.2 excels at exploration and Claude Opus 4.5 at exploitation, suggesting OpenAI suits research workflows and Anthropic commercial reliability. The rise of agentic coding loops shows new failure modes, with self-verification workflows gaining traction. The open-model Kimi K2.5 emerges as a flashpoint, boasting enhanced agent execution, multimodality, and coding polish, runnable on Apple silicon M3 Ultra Mac Studios with Thunderbolt 5 (RDMA), and challenging Claude Opus 4.5 on benchmarks and pricing. Licensing issues threaten enterprise adoption despite model quality. The meme "clawdbot" reflects rapid agent branding proliferation. Agent engineering advances with shared "skills" interfaces promoted by DeepLearning.AI, Anthropic, and LangChain.

You can also subscribe by rss .

Press Esc or click anywhere to close