All tags
Topic: "continual-learning"
not much happened today
eagle-3.1 unigram-tokenizer qwen-3.5 deepseek-v4-pro mimo deep-agents-v0.6 397b-parameter-model eaglecorp vllm_project perplexity_ai alibaba lightseek nvidia mooncake flashattention kimmonismus deepseek xiaomi langchain baseten trajectory clay harvey decagon mercor rogo rlm inference-optimization long-context speculative-decoding tokenization attention-mechanisms kv-cache cache-hierarchy agent-engineering model-harness-memory-fit continual-learning quantization autoscaling memory-centric-agents evaluation-automation kimmonismus _luofuli vtrivedy10
Inference optimization is increasingly architectural, with EAGLE 3.1 improving speculative decoding and long-context handling, collaborating with vLLM and TorchSpec. Perplexity open-sourced a rebuilt Unigram tokenizer cutting CPU use by 5–6× and achieving 63 µs at 514 tokens. Qwen3.5 hits 580 tokens/s via joint efforts from Alibaba, LightSeek, NVIDIA, Mooncake, and FlashAttention-4 contributors. Price cuts in APIs from Chinese labs are sustainable due to structural KV-cache and attention improvements, exemplified by DeepSeek V4-Pro and Xiaomi MiMo reducing caching costs significantly.
Agent engineering shifts focus from model quality to model-harness-memory fit, with LangChain releasing Deep Agents v0.6 and tools like LangSmith Engine automating evaluation loops. Trajectory launched a continual learning platform with $15M funding and partners like Clay and Harvey, supporting large models including a 397B-parameter model deployed on autoscaled H100 infrastructure. Open-source memory-centric agents and minimal training harnesses also gained attention.
not much happened today
codex chatgpt openai github microsoft nous-research moonshot-ai langchain prime-intellect agent-infrastructure agent-first-ux remote-ssh programmatic-access-tokens sandboxing continual-learning agent-trace-data multi-agent-workflows ide-integration browser-extensions hwchase17 caspar_br bentannyhill jakebroekhuizen willccbb
OpenAI expanded Codex integration with the ChatGPT mobile app enabling remote task management and introduced Remote SSH, hooks, and programmatic tokens for enterprise automation. The IDE ecosystem is shifting to "agent-first" UX with GitHub Copilot App preview and VS Code launching a multi-agent workflow window. Open-source agents like Nous/Hermes integrated Codex runtime, and Kimi released a web bridge extension supporting multiple coding agents. LangChain released significant agent infrastructure including SmithDB for agent trace data and LangSmith Engine for trace analysis and continual learning, launching LangChain Labs to improve agents via production trace feedback loops.
not much happened today
claude-3 codex gemini gpt-5.2-pro anthropic openai google sakana-ai cursor baseten epoch-ai-research deepmind benchmarking reasoning continual-learning reinforcement-learning model-performance agentic-ai security model-training sama fchollet shane_legg demishassabis
Anthropic launches "Claude in Excel Pro" with enhanced features. OpenAI reveals upcoming Codex agent loop and cybersecurity measures. Google boosts Gemini App quotas and partners with Sakana AI for advanced AI Scientist projects in Japan. Cursor introduces Agent Skills for dynamic context focus. GPT-5.2 Pro achieves 31% on FrontierMath Tier 4, showing significant benchmark progress. Baseten raises $300M at a $5B valuation targeting high-performance inference. Discussions highlight math benchmarks as indicators of AI capability, uneven AGI progress, and the importance of reasoning and continual learning as future frontiers. Notable figures include Sam Altman, François Chollet, Shane Legg, and Demis Hassabis.