All tags
Company: "teknim"
Microsoft Build: MAI-Thinking-1 and MAI Family models, Surface RTX Spark Dev Box, and OpenClaw in Windows
mai-thinking-1 mai-code-1-flash holo-3.1 qwen-35b sonnet-4.6 claude-code codex microsoft openrouter fal baseten hcompany_ai teksedge nous-research teknim cognition windsurf perplexity-ai mixture-of-experts context-windows benchmarking reinforcement-learning prompt-optimization agentic-ai local-inference model-family-expansion model-reporting agent-native-devices software-development model-optimization hybrid-inference desktop-agents model-quantization mustafasuleyman eliebakouch hannahajishirzi asadovsky bj2rn lateinteraction lakshyaaagrawal theturingpost kimmonismus yusuf_i_mehdi pierceboggan lukehoban nielsrogge russelljkaplan
Microsoft introduced MAI-Thinking-1, a 35B parameter MoE model with 256K context, achieving 97% on AIME 2025 and outperforming Sonnet 4.6 in human preference tests. The broader 7-model MAI family spans reasoning, code, image, speech, and voice, with third-party availability on OpenRouter, fal, and Baseten. The detailed 109-page technical report revealed insights on scaling, MFU, RL/post-training, and data curation, highlighting no third-party distillation and advanced prompt optimization techniques. Microsoft emphasized agent-native devices and local inference with projects like Project Solara / Scout and the Surface RTX Spark Dev Box, alongside software innovations such as the Copilot desktop app and MAI-Code-1-Flash integration. Meanwhile, local-first computer-use agents like Holo 3.1 (Qwen-based, 0.8B to 35B parameters) support laptops and small workstations with optimized formats and strong benchmark results. Desktop shells for agents, including Hermes Desktop, Devin Desktop, and agent-neutral approaches compatible with Devin, Claude Code, and Codex, are proliferating, with hybrid local/cloud execution becoming the default architecture as seen in Perplexity Computer's hybrid agentic inference.
Meta Superintelligence Labs acquires Manus AI for over $2B, at $100M ARR, 9months after launch
glm-4.7 minimax-m2.1 vllm manus benchmark meta-ai-fair vllm amd sglang weaviate teknim baseten alphaxiv minimax performance-optimization inference-frameworks model-benchmarking model-deployment open-source-models multimodality api code-generation community-building alex_wang nat_friedman
Manus achieved a rapid growth trajectory in 2025, raising $500M from Benchmark and reaching $100M ARR before being acquired by Meta for an estimated $4B. The vLLM team launched a dedicated community site with new resources, while performance issues with AMD MI300X FP8 were noted in vLLM and sglang benchmarks. Weaviate released operational features including Object TTL, Java v6 client GA, and multimodal document embeddings. API fragmentation concerns were raised by Teknium advocating for unified SDK wrappers. In open-weight models, GLM-4.7 gained recognition as a reliable coding model with faster throughput on Baseten, and MiniMax-M2.1 rose as a leading open agentic coder model, topping WebDev leaderboards.
12/25/2023: Nous Hermes 2 Yi 34B for Christmas
nous-hermes-2 yi-34b nucleusx yayi-2 ferret teknim nous-research apple mixtral deepseek qwen huggingface wenge-technology quantization model-optimization throughput-metrics batch-processing parallel-decoding tensor-parallelization multimodality language-model-pretraining model-benchmarking teknium carsonpoole casper_ai pradeep1148 osanseviero metaldragon01
Teknium released Nous Hermes 2 on Yi 34B, positioning it as a top open model compared to Mixtral, DeepSeek, and Qwen. Apple introduced Ferret, a new open-source multimodal LLM. Discussions in the Nous Research AI Discord focused on AI model optimization and quantization techniques like AWQ, GPTQ, and AutoAWQ, with insights on proprietary optimization and throughput metrics. Additional highlights include the addition of NucleusX Model to transformers, a 30B model with 80 MMLU, and the YAYI 2 language model by Wenge Technology trained on 2.65 trillion tokens. "AutoAWQ outperforms vLLM up to batch size 8" was noted, and proprietary parallel decoding and tensor parallelization across GPUs were discussed for speed improvements.
12/20/2023: Project Obsidian - Multimodal Mistral 7B from Nous
gpt-4 gpt-3.5 dall-e-3 nous-research teknim openai multimodality image-detection security-api bias facial-recognition healthcare-ai gpu-optimization prompt-engineering vision
Project Obsidian is a multimodal model being trained publicly, tracked by Teknium on the Nous Discord. Discussions include 4M: Massively Multimodal Masked Modeling and Reason.dev, a TypeScript framework for LLM applications. The OpenAI Discord community discussed hardware specs for running TensorFlow JS for image detection, security API ideas for filtering inappropriate images, and concerns about racial and cultural bias in AI, especially in facial recognition and healthcare. Challenges with GPT-3.5 and GPT-4 in word puzzle games were noted, along with GPU recommendations prioritizing VRAM for AI inference. Users also debated GPT-4's vision capabilities, limitations of DALL·E 3, platform access issues, and prompting strategies for better outputs.