thu-wyz / inference_scaling
β63Updated 5 months ago
Alternatives and similar repositories for inference_scaling:
Users that are interested in inference_scaling are comparing it to the libraries listed below
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervisionβ120Updated 7 months ago
- GenRM-CoT: Data release for verification rationalesβ56Updated 6 months ago
- [NeurIPS'24] Official code for *π―DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*β101Updated 4 months ago
- Repo of paper "Free Process Rewards without Process Labels"β143Updated last month
- β125Updated 3 weeks ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyβ60Updated 4 months ago
- β65Updated last year
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"β175Updated last month
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witβ¦β122Updated 9 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoningβ48Updated 5 months ago
- β60Updated this week
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.β115Updated last month
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".β95Updated last month
- β54Updated last week
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".β52Updated 4 months ago
- A curated list of awesome resources dedicated to Scaling Laws for LLMsβ71Updated 2 years ago
- β187Updated 2 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scalingβ101Updated 3 months ago
- β149Updated 4 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)β57Updated 6 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learningβ94Updated last week
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"β53Updated 3 weeks ago
- β157Updated 3 weeks ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)β136Updated 2 months ago
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DPβ¦β25Updated 4 months ago
- β93Updated last month
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correctionβ68Updated last month
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or reβ¦β29Updated 7 months ago
- Code for "Reasoning to Learn from Latent Thoughts"β91Updated 3 weeks ago
- β74Updated this week