Jiayi-Pan / TinyZero
Minimal reproduction of DeepSeek R1-Zero
☆11,700Updated 2 weeks ago
Alternatives and similar repositories for TinyZero:
Users that are interested in TinyZero are comparing it to the libraries listed below
- verl: Volcano Engine Reinforcement Learning for LLMs☆7,626Updated this week
- Simple RL training for reasoning☆3,519Updated 3 weeks ago
- Fully open reproduction of DeepSeek-R1☆24,269Updated this week
- s1: Simple test-time scaling☆6,332Updated last month
- SGLang is a fast serving framework for large language models and vision language models.☆13,976Updated this week
- Democratizing Reinforcement Learning for LLMs☆3,182Updated 3 weeks ago
- Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥☆38,242Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆46,848Updated this week
- Train transformer language models with reinforcement learning.☆13,624Updated this week
- An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & LoRA & vLLM & RFT)☆6,595Updated this week
- Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation☆7,747Updated 3 weeks ago
- Official inference framework for 1-bit LLMs☆18,044Updated last week
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆12,076Updated this week
- A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.☆6,715Updated this week
- NanoGPT (124M) in 3 minutes☆2,520Updated last week
- Janus-Series: Unified Multimodal Understanding and Generation Models☆17,204Updated 3 months ago
- Witness the aha moment of VLM with less than $3.☆3,622Updated 2 months ago
- Fast and memory-efficient exact attention☆17,259Updated last week
- DSPy: The framework for programming—not prompting—language models☆24,061Updated this week
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆10,262Updated this week
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆2,586Updated 3 weeks ago
- A live stream development of RL tunning for LLM agents☆2,617Updated this week
- ☆3,325Updated 2 months ago
- Sky-T1: Train your own O1 preview model within $450☆3,232Updated 2 weeks ago
- Video+code lecture on building nanoGPT from scratch☆4,077Updated 8 months ago
- AllenAI's post-training codebase☆2,942Updated this week
- 🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation☆16,131Updated this week
- No fortress, purely open ground. OpenManus is Coming.☆45,114Updated last week
- DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence☆5,711Updated 7 months ago
- Everything about the SmolLM2 and SmolVLM family of models☆2,273Updated last month