All tags
Model: "llama-3.3-70b"
DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level
deepseek-r1 deepseek-v3 qwen-2.5 llama-3.1 llama-3.3-70b deepseek ollama qwen llama reinforcement-learning fine-tuning model-distillation model-optimization reasoning reward-models multi-response-sampling model-training
DeepSeek released DeepSeek R1, a significant upgrade over DeepSeek V3 from just three weeks prior, featuring 8 models including full-size 671B MoE models and multiple distillations from Qwen 2.5 and Llama 3.1/3.3. The models are MIT licensed, allowing finetuning and distillation. Pricing is notably cheaper than o1 by 27x-50x. The training process used GRPO (reward for correctness and style outcomes) without relying on PRM, MCTS, or reward models, focusing on reasoning improvements through reinforcement learning. Distilled models can run on Ollama and show strong capabilities like writing Manim code. The release emphasizes advances in reinforcement-learning, fine-tuning, and model-distillation with a novel RL framework from DeepSeekMath.
Meta Llama 3.3: 405B/Nova Pro performance at 70B price
llama-3-70b llama-3.3-70b gpt-4o gemini-exp-1206 meta-ai-fair openai google-deepmind hugging-face llamacloud reinforcement-learning fine-tuning model-performance document-processing pricing-models alignment online-rl sama steven-heidel aidan_mclau lmarena_ai oriolvinyalsml jerryjliu0
Meta AI released Llama 3.3 70B, matching the performance of the 405B model with improved efficiency using "a new alignment process and progress in online RL techniques". OpenAI announced Reinforcement Fine-Tuning (RFT) for building expert models with limited data, offering alpha access to researchers and enterprises. Google DeepMind's Gemini-Exp-1206 leads benchmarks, tying with GPT-4o in coding performance. LlamaCloud enhanced document processing with table extraction and analytics. Discussions on OpenAI's pricing plans continue in the community.