vmicheli / lm-butlersLinks
☆12Updated 3 years ago
Alternatives and similar repositories for lm-butlers
Users that are interested in lm-butlers are comparing it to the libraries listed below
Sorting:
- Implements the Messenger environment and EMMA model.☆23Updated last year
- ☆93Updated 11 months ago
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆43Updated last year
- ☆34Updated last year
- ☆34Updated 2 months ago
- Super fast implementations of common benchmark text world games☆47Updated 2 months ago
- ☆31Updated last year
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆159Updated last week
- ☆54Updated 2 weeks ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 8 months ago
- Official Repo of LangSuitE☆84Updated 9 months ago
- Rewarded soups official implementation☆58Updated last year
- Code and data for "Inferring Rewards from Language in Context" [ACL 2022].☆15Updated 3 years ago
- A reinforcement learning environment for the IGLU 2022 at NeurIPS☆33Updated 2 years ago
- ☆20Updated 3 years ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆177Updated last month
- Directional Preference Alignment☆57Updated 8 months ago
- ☆84Updated 10 months ago
- Pre-Trained Language Models for Interactive Decision-Making [NeurIPS 2022]☆126Updated 2 years ago
- ☆97Updated last year
- Codebase for "Uni[MASK]: Unified Inference in Sequential Decision Problems"☆55Updated 11 months ago
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆34Updated last year
- ☆129Updated 10 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆54Updated last year
- Official code for "Can Wikipedia Help Offline Reinforcement Learning?" by Machel Reid, Yutaro Yamada and Shixiang Shane Gu☆105Updated 2 years ago
- ☆29Updated last year
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 8 months ago
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…☆127Updated last year
- The source code of the paper "Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Pla…☆95Updated 9 months ago
- ☆38Updated 10 months ago