elated-sawyer / WALL-E
Official code for the paper: WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents
☆18Updated last month
Related projects ⓘ
Alternatives and complementary repositories for WALL-E
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆46Updated 2 weeks ago
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆96Updated last year
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆99Updated 3 weeks ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆74Updated 9 months ago
- ☆16Updated last month
- code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning☆33Updated 8 months ago
- ☆21Updated 5 months ago
- Paper collections of the continuous effort start from World Models.☆140Updated 4 months ago
- ☆114Updated 4 months ago
- The source code of the paper "Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Pla…☆77Updated 3 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆97Updated last month
- Offline RLHF codebase implementation for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human …☆31Updated 7 months ago
- ☆31Updated 3 weeks ago
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆30Updated 8 months ago
- ☆89Updated 3 months ago
- Uni-RLHF platform for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback" (ICLR2024…☆30Updated this week
- [ICLR 2024] Code for the paper "Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning"☆129Updated 3 weeks ago
- ☆40Updated 11 months ago
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆24Updated 8 months ago
- [NIPS24W]This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated…☆73Updated 4 months ago
- Benchmarking LLMs' Gaming Ability in Multi-Agent Environments☆39Updated last month
- The code for "Can Large Language Model Agents Simulate Human Trust Behaviors?"☆41Updated 2 weeks ago
- ☆29Updated this week
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆40Updated 9 months ago
- ☆28Updated 7 months ago
- ☆22Updated 2 months ago
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆120Updated last week
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆31Updated 3 weeks ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆55Updated 3 months ago
- Official repository for Decentralized Arena via Collective LLM Intelligence☆8Updated last month