shenao-zhang / BARLLinks
Bayes-Adaptive RL for LLM Reasoning
☆41Updated 6 months ago
Alternatives and similar repositories for BARL
Users that are interested in BARL are comparing it to the libraries listed below
Sorting:
- ☆52Updated 7 months ago
- ☆42Updated 5 months ago
- Natural Language Reinforcement Learning☆100Updated 4 months ago
- Sotopia-RL: Reward Design for Social Intelligence☆44Updated 3 months ago
- ☆51Updated 10 months ago
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆30Updated last year
- ☆65Updated 9 months ago
- ☆28Updated 9 months ago
- ☆53Updated 10 months ago
- Verlog: A Multi-turn RL framework for LLM agents☆66Updated 3 weeks ago
- The official implementation of Self-Exploring Language Models (SELM)☆63Updated last year
- [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning☆139Updated 2 months ago
- ☆19Updated 9 months ago
- Reflect-RL: Two-Player Online RL Fine-Tuning for LMs☆17Updated 4 months ago
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF☆24Updated last year
- Dateset Reset Policy Optimization☆31Updated last year
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆45Updated 7 months ago
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆54Updated last month
- Resa: Transparent Reasoning Models via SAEs☆45Updated 2 months ago
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆62Updated 11 months ago
- Reinforcing General Reasoning without Verifiers☆92Updated 5 months ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆23Updated 9 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆40Updated last month
- WONDERBREAD benchmark + dataset for BPM tasks☆31Updated 4 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆147Updated last year
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆32Updated last year
- [NeurIPS'24 LanGame workshop] On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆41Updated 5 months ago
- ☆16Updated last year
- implementation of dualformer☆24Updated 9 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆40Updated 4 months ago