CyberAgentAILab / regularized-bon
Code of "Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment" (2025).
☆14Updated 3 weeks ago
Alternatives and similar repositories for regularized-bon:
Users that are interested in regularized-bon are comparing it to the libraries listed below
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆47Updated last year
- ☆66Updated last month
- Repo for "Z1: Efficient Test-time Scaling with Code"☆53Updated 2 weeks ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆34Updated last month
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆47Updated 2 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆53Updated last year
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆57Updated 4 months ago
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆62Updated 11 months ago
- This the implementation of LeCo☆32Updated 3 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 3 months ago
- Code for preprint "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆36Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- ☆107Updated 3 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆68Updated last month
- NeurIPS 2024 tutorial on LLM Inference☆41Updated 4 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆134Updated 5 months ago
- ☆96Updated 9 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆79Updated 2 months ago
- ☆57Updated last month
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆84Updated last month
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- ☆17Updated last month
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- ☆17Updated 3 weeks ago
- ☆60Updated 11 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆74Updated 10 months ago
- Exploring Model Kinship for Merging Large Language Models☆23Updated last week
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆94Updated last week
- Plug in & Play Pytorch Implementation of the paper: "Evolutionary Optimization of Model Merging Recipes" by Sakana AI☆30Updated 5 months ago