step-law / steplaw
☆183Updated 3 weeks ago
Alternatives and similar repositories for steplaw:
Users that are interested in steplaw are comparing it to the libraries listed below
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆120Updated last month
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆175Updated last month
- ☆192Updated 2 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆234Updated 3 weeks ago
- ☆194Updated 6 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆188Updated 2 months ago
- ☆149Updated last week
- The related works and background techniques about Openai o1☆221Updated 4 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆163Updated this week
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework☆306Updated last month
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆291Updated 2 weeks ago
- ☆138Updated last week
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆123Updated this week
- A Comprehensive Survey on Long Context Language Modeling☆138Updated last month
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆132Updated 10 months ago
- A flexible and efficient training framework for large-scale alignment tasks☆345Updated 2 months ago
- ☆153Updated last month
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆112Updated 2 weeks ago
- A generalized framework for subspace tuning methods in parameter efficient fine-tuning.☆139Updated 3 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆75Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆195Updated last month
- Reproducing R1 for Code with Reliable Rewards☆188Updated 2 weeks ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆136Updated last month
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆71Updated last week
- qwen-nsa☆57Updated 3 weeks ago
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment☆327Updated last year
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆120Updated last month
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆364Updated 3 months ago
- ☆319Updated 9 months ago
- ☆106Updated last year