Joshuaclymer / GameBench
☆18Updated 10 months ago
Alternatives and similar repositories for GameBench
Users that are interested in GameBench are comparing it to the libraries listed below
Sorting:
- official implementation of paper "Process Reward Model with Q-value Rankings"☆57Updated 3 months ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆105Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆81Updated last month
- [ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)☆19Updated 3 months ago
- ☆15Updated 6 months ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆34Updated this week
- ☆30Updated 2 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 8 months ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆15Updated last month
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆85Updated last month
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 6 months ago
- ☆31Updated 4 months ago
- Training and Benchmarking LLMs for Code Preference.☆33Updated 6 months ago
- Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner☆22Updated 10 months ago
- ☆26Updated 4 months ago
- "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆73Updated last month
- ☆110Updated 3 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆69Updated last month
- ☆36Updated 2 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆24Updated 2 months ago
- ☆45Updated 3 months ago
- ☆25Updated 2 months ago
- ☆58Updated 2 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆78Updated 6 months ago
- implementation of dualformer☆17Updated 2 months ago
- ☆15Updated 2 months ago
- ☆73Updated 6 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆99Updated last year
- Evaluate the Quality of Critique☆35Updated 11 months ago