All tags
Topic: "hybrid-architecture"
MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model
minimax-m2.7 sonnet-4.6 glm-5 mimo-v2-pro mamba-3 qwen-3.5 kimi-k2.5 gpt-5.4-mini minimax xiaomi artificial-analysis ollama trae yupp openrouter vercel zo opencode kilocode cartesia self-evolving-agents reasoning cost-efficiency token-efficiency hybrid-architecture harness-engineering agent-harnesses skills memory-optimization architecture feedback-loops api inference execution-environment
MiniMax M2.7 is the headline model release, described as a "self-evolving agent" with strong performance metrics including 56.22% on SWE-Pro, 57.0% on Terminal Bench 2, and parity with Sonnet 4.6. It features recursive self-improvement in skills, memory, and architecture. Artificial Analysis places M2.7 on the cost/performance frontier with an Intelligence Index score of 50, matching GLM-5 (Reasoning) but at a fraction of the cost. Distribution is available via platforms like Ollama cloud and OpenRouter. Xiaomi’s MiMo-V2-Pro is noted as a serious Chinese API-only reasoning model with a score of 49 on the Intelligence Index and favorable token efficiency. Cartesia’s Mamba-3 is highlighted as an SSM optimized for inference-heavy use, with early reactions focusing on hybrid transformer architectures like Qwen3.5 and Kimi Linear. The report emphasizes a shift from prompting to harness engineering, where the execution environment and agent harnesses, including skills and MCP, are becoming key differentiators in AI system design. This includes discussions on tools, repo legibility, constraints, and feedback loops, with mentions of DSPy and GPT-5.4 mini as important components in this evolving landscape.
NVIDIA Nemotron 3: hybrid Mamba-Transformer completely open source models from 30B to 500B
nemotron-3-nano qwen3-30b-a3b-base nvidia huggingface togethercompute baseten vllm llamaindex hybrid-architecture mixture-of-experts reinforcement-learning long-context model-release open-source-models model-training model-optimization benchmarking agent-training ctnzr andrew_n_carr awnihannun
NVIDIA has released Nemotron 3 Nano, a fully open-source hybrid Mamba-Transformer Mixture-of-Experts (MoE) model with a 30B parameter size and a 1 million token context window. It includes open weights, training recipes, datasets, and an RL environment suite called NeMo Gym, supporting commercial use under the NVIDIA Open Model License. The model achieves state-of-the-art results on benchmarks like SWE-Bench and Artificial Analysis Intelligence Index, outperforming Qwen3-30B A3B. Ecosystem support is immediate with integrations into inference stacks like vLLM, llama.cpp, and Baseten. Upcoming larger models, Nemotron Super and Ultra, will feature NVFP4 pretraining and LatentMoE routing to optimize compute. This release marks a significant milestone for open-source American AI with comprehensive open assets and advanced hybrid architecture.
Qwen3-Next-80B-A3B-Base: Towards Ultimate Training & Inference Efficiency
qwen3-next qwen3 mixtral-8x7b gemini-2.5-pro alibaba mistral-ai deepseek snowflake hugging-face baseten nvidia mixture-of-experts model-sparsity gated-attention hybrid-architecture rmsnorm model-stability model-training inference-optimization multi-token-prediction model-deployment justinlin610 teortaxestex yuchenj_uw
MoE (Mixture of Experts) models have become essential in frontier AI models, with Qwen3-Next pushing sparsity further by activating only 3.7% of parameters (3B out of 80B) using a hybrid architecture combining Gated DeltaNet and Gated Attention. This new design includes 512 total experts (10 routed + 1 shared), Zero-Centered RMSNorm for stability, and improved MoE router initialization, resulting in ~10× cheaper training and 10× faster inference compared to previous models. Alibaba's Qwen3-Next reportedly outperforms Gemini-2.5-Flash-Thinking and approaches the flagship 235B model's performance, with deployments on Hugging Face, Baseten, and native vLLM support for efficient inference.
Mixtral 8x22B Instruct sparks efficiency memes
mixtral-8x22b llama-2-7b olmo-7b mistral-ai hugging-face google microsoft intel softbank nvidia multilinguality math code-generation context-window model-performance model-release retrieval-augmented-generation deepfake ai-investment ai-chip hybrid-architecture training-data guillaume-lample osanseviero _philschmid svpino
Mistral released an instruct-tuned version of their Mixtral 8x22B model, notable for using only 39B active parameters during inference, outperforming larger models and supporting 5 languages with 64k context window and math/code capabilities. The model is available on Hugging Face under an Apache 2.0 license for local use. Google plans to invest over $100 billion in AI, with other giants like Microsoft, Intel, and SoftBank also making large investments. The UK criminalized non-consensual deepfake porn, raising enforcement debates. A former Nvidia employee claims Nvidia's AI chip lead is unmatchable this decade. AI companions could become a $1 billion market. AI has surpassed humans on several basic tasks but lags on complex ones. Zyphra introduced Zamba, a novel 7B parameter hybrid model outperforming LLaMA-2 7B and OLMo-7B with less training data, trained on 128 H100 GPUs over 30 days. GroundX API advances retrieval-augmented generation accuracy.