step-law / steplawLinks

☆196

Alternatives and similar repositories for steplaw

Users that are interested in steplaw are comparing it to the libraries listed below

Sorting:

InternLM / OREAL
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
☆188Updated 4 months ago
HarderThenHarder / RLLoggingBoard
A visuailzation tool to make deep understaning and easier debugging for RLHF training.
☆238Updated 5 months ago
GAIR-NLP / MAYE
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
☆138Updated 3 months ago
sail-sg / oat-zero
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
☆245Updated 3 months ago
SuperGPQA / SuperGPQA
☆157Updated 3 months ago
LCLM-Horizon / A-Comprehensive-Survey-For-Long-Context-Language-Modeling
A Comprehensive Survey on Long Context Language Modeling
☆166Updated 3 weeks ago
openpsi-project / ReaLHF
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
☆305Updated 3 months ago
lzhxmu / CPPO
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
☆145Updated last month
GAIR-NLP / LIMR
☆205Updated 5 months ago
QwenLM / ParScale
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
☆417Updated 2 months ago
MiniMax-AI / One-RL-to-See-Them-All
The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning
☆303Updated 2 months ago
InternLM / POLAR
Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.
☆140Updated 3 weeks ago
boson-ai / RPBench-Auto
An automated pipeline for evaluating LLMs for role-playing.
☆192Updated 10 months ago
a-m-team / a-m-models
a-m-team's exploration in large language modeling
☆178Updated 2 months ago
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆181Updated last month
RUC-GSAI / YuLan-Mini
A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.
☆200Updated last week
ByteDance-Seed / VeOmni
VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework
☆395Updated last week
modelscope / Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…
☆149Updated this week
Outsider565 / LoRA-GA
☆204Updated 9 months ago
eddycmu / demystify-long-cot
☆306Updated 2 months ago
IAAR-Shanghai / xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
☆125Updated 3 months ago
GAIR-NLP / ToRL
☆258Updated 2 months ago
ElliottYan / LUFFY
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆268Updated 2 weeks ago
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆232Updated 2 months ago
SkyworkAI / skywork-o1-prm-inference
☆64Updated 8 months ago
NVlabs / Fast-dLLM
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆320Updated this week
TemporaryLoRA / Temp-LoRA
☆107Updated last year
ByteDance-Seed / Seed-Thinking-v1.5
☆800Updated last month
OpenRLHF / OpenRLHF-M
An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.
☆138Updated 3 months ago
wjn1996 / Awesome-LLM-Reasoning-Openai-o1-Survey
The related works and background techniques about Openai o1
☆224Updated 6 months ago