OpenDFM / Rememberer
[NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents
☆35Updated 11 months ago
Alternatives and similar repositories for Rememberer:
Users that are interested in Rememberer are comparing it to the libraries listed below
- code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning☆39Updated last year
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆132Updated 5 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆156Updated last year
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆179Updated last year
- Direct preference optimization with f-divergences.☆13Updated 4 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆95Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆130Updated last month
- An index of algorithms for reinforcement learning from human feedback (rlhf))☆93Updated 11 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆52Updated 10 months ago
- ☆24Updated last year
- [ICLR 2024 Spotlight] Code for the paper "Text2Reward: Reward Shaping with Language Models for Reinforcement Learning"☆155Updated 3 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆119Updated 6 months ago
- Implementation of "Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agen…☆271Updated last year
- ☆129Updated 3 months ago
- Natural Language Reinforcement Learning☆84Updated 3 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆301Updated 7 months ago
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆167Updated 2 months ago
- ☆125Updated 8 months ago
- Implementation of TWOSOME☆69Updated 2 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆72Updated 7 months ago
- A comprehensive list of PAPERS, CODEBASES, and, DATASETS on Decision Making using Foundation Models including LLMs and VLMs.☆362Updated 11 months ago
- Paper collections of the continuous effort start from World Models.☆169Updated 8 months ago
- SOTA RL fine-tuning solution for advanced math reasoning of LLM☆92Updated this week
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Updated last year
- GenRM-CoT: Data release for verification rationales☆53Updated 5 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 4 months ago
- Reasoning with Language Model is Planning with World Model☆162Updated last year
- Benchmarking LLMs' Gaming Ability in Multi-Agent Environments☆71Updated last month
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆41Updated last year
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆37Updated 3 weeks ago