GAIR-NLP / LIMR
โ166Updated last month
Alternatives and similar repositories for LIMR:
Users that are interested in LIMR are comparing it to the libraries listed below
- โ260Updated last week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningโ148Updated last week
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ189Updated 10 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.โ212Updated this week
- Repo of paper "Free Process Rewards without Process Labels"โ138Updated last week
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correctโ165Updated 2 months ago
- โ143Updated 3 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witโฆโ118Updated 8 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"โ166Updated 2 weeks ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".โ74Updated last week
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.โ101Updated this week
- Reference implementation for Token-level Direct Preference Optimization(TDPO)โ130Updated last month
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Modelsโ250Updated 6 months ago
- On Memorization of Large Language Models in Logical Reasoningโ56Updated 4 months ago
- โ128Updated this week
- The official repository of the Omni-MATH benchmark.โ77Updated 3 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.โ297Updated 7 months ago
- [NeurIPS'24] Official code for *๐ฏDART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*โ98Updated 3 months ago
- SOTA RL fine-tuning solution for advanced math reasoning of LLMโ91Updated this week
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)โ178Updated last year
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"โ131Updated last month
- โ61Updated 4 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".โ51Updated 3 months ago
- โ263Updated 8 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoningโ158Updated this week
- A Survey on Efficient Reasoning for LLMsโ116Updated this week
- โ48Updated last month
- Reformatted Alignmentโ115Updated 6 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.โ60Updated 4 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scalingโ95Updated 2 months ago