step-law / steplaw
☆151Updated this week
Alternatives and similar repositories for steplaw:
Users that are interested in steplaw are comparing it to the libraries listed below
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆169Updated 3 weeks ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆102Updated last week
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework☆285Updated last week
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆277Updated 3 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆222Updated 2 weeks ago
- ☆184Updated last month
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆158Updated last week
- ☆191Updated 5 months ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆110Updated last week
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆109Updated last week
- ☆137Updated last month
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆186Updated last month
- ☆278Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆180Updated last month
- A Comprehensive Survey on Long Context Language Modeling☆129Updated 3 weeks ago
- ☆62Updated 4 months ago
- The related works and background techniques about Openai o1☆219Updated 3 months ago
- Reproducing R1 for Code with Reliable Rewards☆167Updated last week
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆131Updated 10 months ago
- ☆662Updated this week
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆160Updated this week
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆97Updated last week
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆473Updated this week
- SOTA RL fine-tuning solution for advanced math reasoning of LLM☆103Updated last week
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆175Updated last month
- qwen-nsa☆49Updated last week
- A generalized framework for subspace tuning methods in parameter efficient fine-tuning.☆136Updated 2 months ago
- ☆74Updated 3 weeks ago
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆359Updated 2 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆71Updated 3 weeks ago