liangyuwang / zo2Links
ZO2 (Zeroth-Order Offloading): Full Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory
☆116Updated last month
Alternatives and similar repositories for zo2
Users that are interested in zo2 are comparing it to the libraries listed below
Sorting:
- Pretrain、decay、SFT a CodeLLM from scratch 🧙♂️☆36Updated last year
- [ACL 2025] An official pytorch implement of the paper: Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement☆29Updated 3 weeks ago
- ☆81Updated last week
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆130Updated 2 months ago
- "what, how, where, and how well? a survey on test-time scaling in large language models" repository☆44Updated last week
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆134Updated 2 weeks ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆133Updated last year
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆122Updated 7 months ago
- ☆190Updated 2 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆189Updated 2 weeks ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆72Updated 4 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆84Updated 3 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32Updated last year
- ☆36Updated 9 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆75Updated 2 weeks ago
- A generalized framework for subspace tuning methods in parameter efficient fine-tuning.☆142Updated 4 months ago
- Efficient Mixture of Experts for LLM Paper List☆72Updated 6 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆122Updated this week
- Repo for "Z1: Efficient Test-time Scaling with Code"☆60Updated 2 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆104Updated 3 weeks ago
- ☆105Updated last year
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆103Updated last month
- ☆53Updated this week
- ☆43Updated 3 months ago
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆60Updated last month
- 🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal rei…☆110Updated last week
- ☆201Updated 8 months ago
- ☆202Updated 4 months ago
- qwen-nsa☆67Updated 2 months ago
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆159Updated 3 months ago