jhejna / cpl
Code for Contrastive Preference Learning (CPL)
☆147Updated 6 months ago
Related projects: ⓘ
- ☆65Updated 2 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆84Updated 5 months ago
- ☆102Updated 2 months ago
- Paper collections of the continuous effort start from World Models.☆127Updated 2 months ago
- MTM Masked Trajectory Models for Prediction, Representation, and Control.☆145Updated last year
- A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)☆116Updated last week
- Pre-Trained Language Models for Interactive Decision-Making [NeurIPS 2022]☆116Updated 2 years ago
- ☆98Updated 2 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆89Updated 2 months ago
- Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"☆197Updated last year
- We perform functional grounding of LLMs' knowledge in BabyAI-Text☆213Updated 3 weeks ago
- Official implementation of the DECKARD Agent from the paper "Do Embodied Agents Dream of Pixelated Sheep?"☆84Updated last year
- [NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling bett…☆141Updated 3 months ago
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆82Updated last year
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆35Updated 8 months ago
- Preference Transformer: Modeling Human Preferences using Transformers for RL (ICLR2023 Accepted)☆147Updated 11 months ago
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆115Updated 5 months ago
- Official Repo of LangSuitE☆74Updated last month
- ☆87Updated 2 months ago
- [ICLR 2024] Code for the paper "Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning"☆113Updated 8 months ago
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆31Updated 4 months ago
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model.☆206Updated last week
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents☆81Updated last week
- a simple and scalable agent for training adaptive policies with sequence-based RL☆79Updated this week
- RLHF implementation details of OAI's 2019 codebase☆144Updated 8 months ago
- The source code of the paper "Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Pla…☆69Updated last month
- ☆131Updated 4 months ago
- Efficient World Models with Context-Aware Tokenization. ICML 2024☆73Updated 2 months ago
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning☆176Updated this week
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆24Updated last week