vmicheli / lm-butlers
β12Updated 3 years ago
Related projects β
Alternatives and complementary repositories for lm-butlers
- β26Updated last year
- πΎ OAT: Online AlignmenT for LLMsβ32Updated last week
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversityβ38Updated 10 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).β49Updated 5 months ago
- Super fast implementations of common benchmark text world gamesβ43Updated 2 weeks ago
- Code and data for "Inferring Rewards from Language in Context" [ACL 2022].β15Updated 2 years ago
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)β23Updated 11 months ago
- Rewarded soups official implementationβ51Updated last year
- β33Updated 9 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"β108Updated 7 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"β80Updated last week
- β28Updated last year
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervisionβ97Updated 2 months ago
- β73Updated 4 months ago
- Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).β14Updated last year
- β35Updated 4 months ago
- β19Updated 2 years ago
- Self-Supervised Alignment with Mutual Informationβ14Updated 5 months ago
- Domain-specific preference (DSP) data and customized RM fine-tuning.β24Updated 8 months ago
- [ICML 2024] Language Models Represent Beliefs of Self and Othersβ26Updated last month
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)β99Updated 3 weeks ago
- Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"β22Updated 3 weeks ago
- Code for LaMPP: Language Models as Probabilistic Priors for Perception and Actionβ35Updated last year
- The source code of the paper "Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Plaβ¦β77Updated 3 months ago
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWWβ¦β123Updated last year
- GenRM-CoT: Data release for verification rationalesβ23Updated last month
- ToMBench: Benchmarking Theory of Mind in Large Language Models, ACL 2024.β31Updated 4 months ago
- Dateset Reset Policy Optimizationβ28Updated 7 months ago
- β46Updated 10 months ago
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradientsβ26Updated 2 months ago