koalazf99 / nanoverlLinks

Collections of RLxLM experiments using minimal codes

☆14

Alternatives and similar repositories for nanoverl

Users that are interested in nanoverl are comparing it to the libraries listed below

Sorting:

GAIR-NLP / self-improvement-reversal
☆13Updated last year
GAIR-NLP / weak-to-strong-reasoning
☆58Updated last year
KbsdJames / Omni-MATH
The official repository of the Omni-MATH benchmark.
☆88Updated 9 months ago
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆73Updated last week
KbsdJames / omni-math-rule
The rule-based evaluation subset and code implementation of Omni-MATH
☆23Updated 9 months ago
GAIR-NLP / BeHonest
BeHonest: Benchmarking Honesty in Large Language Models
☆34Updated last year
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆114Updated 5 months ago
PremiLab-Math / MathCheck
[ICLR 2025] Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
☆33Updated 11 months ago
ChengpengLi1003 / DotaMath
☆30Updated 9 months ago
qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆63Updated last year
hkust-nlp / dart-math
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆115Updated 10 months ago
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75Updated 4 months ago
zzli2022 / TLDR
Code for Research Project TLDR
☆23Updated 2 months ago
rookie-joe / AutoPSV
☆50Updated 11 months ago
RUCAIBox / JiuZhang3.0
The code and data for the paper JiuZhang3.0
☆49Updated last year
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆174Updated 4 months ago
KbsdJames / MATH-Minos
The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…
☆37Updated last year
GAIR-NLP / MetaCritique
Evaluate the Quality of Critique
☆36Updated last year
inclusionAI / PromptCoT
A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…
☆114Updated 3 weeks ago
GAIR-NLP / OlympicArena
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆105Updated 7 months ago
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆82Updated 9 months ago
GAIR-NLP / OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆177Updated 2 months ago
xlang-ai / AgentTrek
[ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
☆42Updated 7 months ago
test-time-interaction / TTI
☆62Updated 4 months ago
TingchenFu / MathIF
instruction-following benchmark for large reasoning models
☆44Updated 2 months ago
hanningzhang / prm
☆17Updated 11 months ago
ars22 / scaling-LLM-math-synthetic-data
Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"
☆30Updated last year
FreedomIntelligence / OVM
☆69Updated last year
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆69Updated 3 months ago
hkust-nlp / Laser
Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆54Updated 4 months ago