abdulhaim / LMRL-Gym
☆75Updated 6 months ago
Alternatives and similar repositories for LMRL-Gym:
Users that are interested in LMRL-Gym are comparing it to the libraries listed below
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆124Updated 9 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆111Updated 2 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆111Updated 4 months ago
- ☆93Updated 6 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆118Updated 5 months ago
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆39Updated 11 months ago
- ☆168Updated last year
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆126Updated 9 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆106Updated last month
- ☆140Updated 8 months ago
- ☆43Updated 2 weeks ago
- ☆125Updated last month
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆100Updated last year
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆95Updated 9 months ago
- 🌾 OAT: Online AlignmenT for LLMs☆81Updated 3 weeks ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆98Updated last year
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 4 months ago
- Reasoning with Language Model is Planning with World Model☆154Updated last year
- Natural Language Reinforcement Learning☆67Updated 3 weeks ago
- ☆25Updated 8 months ago
- Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"☆203Updated last year
- Rewarded soups official implementation☆54Updated last year
- ☆34Updated 11 months ago
- ☆89Updated this week
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆49Updated 7 months ago
- A benchmark for evaluating learning agents based on just language feedback☆61Updated 3 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆53Updated 10 months ago
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.☆234Updated 3 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆59Updated 7 months ago
- ☆28Updated last month