google / werewolf_arenaLinks
☆41Updated last year
Alternatives and similar repositories for werewolf_arena
Users that are interested in werewolf_arena are comparing it to the libraries listed below
Sorting:
- Natural Language Reinforcement Learning☆100Updated 4 months ago
- ☆144Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆198Updated 8 months ago
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆144Updated last year
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆73Updated last year
- official implementation of paper "Process Reward Model with Q-value Rankings"☆65Updated 10 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆112Updated 4 months ago
- ☆108Updated last year
- ☆46Updated 6 months ago
- ☆160Updated last year
- ☆116Updated 11 months ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆32Updated last year
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆203Updated last year
- ☆49Updated 10 months ago
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆70Updated 8 months ago
- This repository contains a LLM benchmark for the social deduction game `Resistance Avalon'☆130Updated 6 months ago
- On Memorization of Large Language Models in Logical Reasoning☆72Updated 8 months ago
- ☆65Updated 9 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆112Updated 10 months ago
- ☆74Updated last month
- Reinforced Multi-LLM Agents training☆60Updated 6 months ago
- ☆68Updated last year
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆254Updated 7 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆115Updated 4 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆56Updated last year
- ☆86Updated 5 months ago
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆32Updated last year
- ☆33Updated last year
- MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)☆72Updated 4 months ago
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆123Updated 8 months ago