sunblaze-ucb / math_oodLinks

☆40

Alternatives and similar repositories for math_ood

Users that are interested in math_ood are comparing it to the libraries listed below

Sorting:

LAMDASZ-ML / Self-Backtracking
☆48Updated 7 months ago
Gen-Verse / CURE
[NeurIPS 2025 Spotlight] ReasonFlux-Coder: Open-Source LLM Coders with Co-Evolving Reinforcement Learning
☆122Updated 2 weeks ago
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆34Updated last month
google-deepmind / bbeh
☆94Updated 4 months ago
GAIR-NLP / OlympicArena
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆105Updated 7 months ago
test-time-interaction / TTI
☆62Updated 3 months ago
ryoungj / BoLT
Code for "Reasoning to Learn from Latent Thoughts"
☆119Updated 6 months ago
hkust-nlp / B-STaR
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆85Updated 4 months ago
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆47Updated 7 months ago
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 5 months ago
yuleiqin / RAIF
A Recipe for Building LLM Reasoners to Solve Complex Instructions
☆24Updated 2 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
sunblaze-ucb / reasoning_ladder
☆35Updated 4 months ago
facebookresearch / dualformer
implementation of dualformer
☆20Updated 7 months ago
complex-reasoning / RPG
Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)
☆40Updated this week
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
sotopia-lab / sotopia-rl
Sotopia-RL: Reward Design for Social Intelligence
☆39Updated last month
shenao-zhang / SELM
The official implementation of Self-Exploring Language Models (SELM)
☆64Updated last year
sail-sg / VeriFree
Reinforcing General Reasoning without Verifiers
☆87Updated 3 months ago
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆106Updated 2 months ago
mandyyyyii / east
☆20Updated 2 months ago
allenai / fluid-benchmarking
Fluid Language Model Benchmarking
☆17Updated 3 weeks ago
shangshang-wang / Resa
Resa: Transparent Reasoning Models via SAEs
☆41Updated 2 weeks ago
sail-sg / feedback-conditional-policy
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"
☆33Updated last week
open-compass / GPassK
[ACL 2025] Are Your LLMs Capable of Stable Reasoning?
☆30Updated 2 months ago
Asap7772 / understanding-rlhf
Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…
☆32Updated last year
katiekang1998 / reasoning_generalization
☆33Updated 8 months ago
casmlab / NPHardEval
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆59Updated last year
spiral-rl / spiral
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
☆151Updated 2 weeks ago
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆57Updated last year