elated-sawyer / WALL-E
Official code for the paper: WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
☆27Updated 2 weeks ago
Alternatives and similar repositories for WALL-E:
Users that are interested in WALL-E are comparing it to the libraries listed below
- ☆25Updated 10 months ago
- ☆47Updated last week
- [ICML 2024] Language Models Represent Beliefs of Self and Others☆31Updated 4 months ago
- ☆124Updated 7 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆66Updated 6 months ago
- code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning☆36Updated 11 months ago
- Natural Language Reinforcement Learning☆72Updated 2 months ago
- ☆28Updated 3 months ago
- ☆16Updated 3 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆116Updated 3 months ago
- Directional Preference Alignment☆56Updated 4 months ago
- ☆22Updated 8 months ago
- Uni-RLHF platform for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback" (ICLR2024…☆33Updated 3 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆115Updated 5 months ago
- Offline RLHF codebase implementation for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human …☆34Updated 10 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆90Updated last year
- ☆14Updated 10 months ago
- Implementation of ICML 2023 paper: Future-conditioned Unsupervised Pretraining for Decision Transformer☆27Updated last year
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆125Updated 2 months ago
- ☆13Updated 3 months ago
- [ACL 2024] Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models☆16Updated 7 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆51Updated 8 months ago
- ☆32Updated last month
- official implementation of paper "Process Reward Model with Q-value Rankings"☆48Updated 2 weeks ago
- ☆44Updated last year
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆40Updated last year
- ☆34Updated last month
- Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)☆41Updated 7 months ago
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆55Updated last month
- [NeurIPSw'24] This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simu…☆81Updated 3 weeks ago