vmicheli / lm-butlersLinks
☆12Updated 3 years ago
Alternatives and similar repositories for lm-butlers
Users that are interested in lm-butlers are comparing it to the libraries listed below
Sorting:
- Code and data for "Inferring Rewards from Language in Context" [ACL 2022].☆15Updated 3 years ago
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆43Updated last year
- ☆32Updated last year
- ☆34Updated last year
- Super fast implementations of common benchmark text world games☆48Updated 3 months ago
- ☆95Updated 11 months ago
- Implements the Messenger environment and EMMA model.☆23Updated 2 years ago
- ☆20Updated 3 years ago
- ☆54Updated 2 years ago
- ☆38Updated 11 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆54Updated last year
- Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).☆16Updated 5 months ago
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Updated last year
- Rewarded soups official implementation☆58Updated last year
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆121Updated 9 months ago
- ☆55Updated last month
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆25Updated last year
- Official code for our EMNLP2021 Outstanding Paper MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks☆22Updated 2 years ago
- The multi-modal sequence to sequence baseline neural models used in the Grounded SCAN paper.☆16Updated 4 years ago
- [ICLR 2022 Spotlight] Multi-Stage Episodic Control for Strategic Exploration in Text Games☆14Updated 3 years ago
- A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.☆44Updated 5 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆179Updated 2 months ago
- ☆16Updated 7 months ago
- ☆44Updated 2 years ago
- ☆84Updated 10 months ago
- Pre-Trained Language Models for Interactive Decision-Making [NeurIPS 2022]☆127Updated 3 years ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆144Updated 7 months ago
- ☆86Updated last year
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆34Updated last year
- ☆34Updated 3 months ago