All tags
Topic: "recursive-self-improvement"
not much happened today
nemotron-3-ultra nemotron-3.5-asr claude-opus-4 mythos-preview nvidia anthropic togethercompute baseten modal vllm_project fireworksai_hq ollama wandb cline primeintellect nousresearch mixture-of-experts long-context model-quantization agentic-ai streaming-speech asr low-precision-training benchmarking recursive-self-improvement code-generation model-speedup piotrz_zelasko
NVIDIA released Nemotron 3 Ultra, a fully open 550B MoE model with 55B active parameters and 1M context, optimized for long-running agent tasks with up to 5x speedup and 30% cost reduction. It features hybrid Mamba/attention, LatentMoE, native MTP, and was pretrained on 20T tokens using NVFP4 low-precision format. Benchmarks show strong performance with 47.7 Intelligence Index and 400+ output tokens/sec. The model is supported across major serving platforms. Additionally, Nemotron 3.5 ASR is an open streaming ASR model with 0.6B parameters, supporting 40 language-locale combinations and sub-100ms latency, designed for voice agents.
Anthropic highlighted early signs of recursive self-improvement (RSI) in AI, with Claude models authoring 80%+ of merged code and engineers shipping 8x more code. Claude Opus 4 achieved 3x speedup on training scripts, while Mythos Preview reached ~52x speedup and provided better research suggestions than humans 64% of the time.
not much happened today
codex openai microsoft cursor_ai langchain-ai agentic-harness-engineering agent-loop-systems-engineering performance-optimization semantic-indexing prompt-evaluation software-engineering sdk-development model-tuning recursive-self-improvement omarsar0 samhogan kimmonismus reach_vb pierceboggan
OpenAI is expanding Codex from a coding tool to a general work surface with persistent context, tools, integrations, and team rollout, including Codex-only seats with $0 seat fee for Business/Enterprise customers through June. Performance improvements focus on agent-loop systems engineering, achieving up to 40% faster agentic workflows via WebSocket mode on the Responses API. VS Code enhances coding-agent UX with semantic indexing, cross-repo search, chat session insights, and prompt/agent evaluation extensions. Cursor launches a Cursor SDK to enable programmable agent infrastructure for CI/CD, automations, and embedded agents, signaling a shift toward headless agent runtimes and usage-based economics. Research highlights Agentic Harness Engineering improving Terminal-Bench 2 pass@1 from 69.7% to 77.0%, surpassing human-designed baselines and reducing token use by 12%. Related work on HALO shows recursive self-improving agents with significant AppWorld score improvements. LangChain’s Deep Agents introduces Harness Profiles for model-specific harness tuning and deployability.
not much happened today
rstar-math o1-preview qwen2.5-plus qwen2.5-coder-32b-instruct phi-4 claude-3.5-sonnet openai anthropic alibaba microsoft cohere langchain weights-biases deepseek rakuten rbc amd johns-hopkins math process-reward-model mcts vision reasoning synthetic-data pretraining rag automation private-deployment multi-step-workflow open-source-dataset text-embeddings image-segmentation chain-of-thought multimodal-reasoning finetuning recursive-self-improvement collaborative-platforms ai-development partnerships cuda triton ai-efficiency ai-assisted-coding reach_vb rasbt akshaykagrawal arankomatsuzaki teortaxestex aidangomez andrewyng
rStar-Math surpasses OpenAI's o1-preview in math reasoning with 90.0% accuracy using a 7B LLM and MCTS with a Process Reward Model. Alibaba launches Qwen Chat featuring Qwen2.5-Plus and Qwen2.5-Coder-32B-Instruct models enhancing vision-language and reasoning. Microsoft releases Phi-4, trained on 40% synthetic data with improved pretraining. Cohere introduces North, a secure AI workspace integrating LLMs, RAG, and automation for private deployments. LangChain showcases a company research agent with multi-step workflows and open-source datasets. Transformers.js demos released for text embeddings and image segmentation in JavaScript. Research highlights include Meta Meta-CoT for enhanced chain-of-thought reasoning, DeepSeek V3 with recursive self-improvement, and collaborative AI development platforms. Industry partnerships include Rakuten with LangChain, North with RBC supporting 90,000 employees, and Agent Laboratory collaborating with AMD and Johns Hopkins. Technical discussions emphasize CUDA and Triton for AI efficiency and evolving AI-assisted coding stacks by Andrew Ng.