OpenDFM / RemembererLinks

[NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents

☆38

Alternatives and similar repositories for Rememberer

Users that are interested in Rememberer are comparing it to the libraries listed below

Sorting:

YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆197Updated 7 months ago
floodsung / LLM-with-RL-papers
A collection of LLM with RL papers
☆278Updated last year
liziniu / ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆196Updated last year
Yifan-Song793 / ETO
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆153Updated last year
CUHK-ARISE / GAMABench
Benchmarking LLMs' Gaming Ability in Multi-Agent Environments
☆88Updated 6 months ago
holarissun / Prompt-OIRL
code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning
☆42Updated last year
haotiansun14 / AdaPlanner
AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback
☆122Updated 7 months ago
CraftJarvis / MC-Planner
Implementation of "Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agen…
☆290Updated 2 years ago
WeihaoTan / TWOSOME
Implementation of TWOSOME
☆82Updated 10 months ago
123penny123 / Awesome-LLM-RL
A comprehensive list of PAPERS, CODEBASES, and, DATASETS on Decision Making using Foundation Models including LLMs and VLMs.
☆381Updated last year
Linear95 / APO
Code for ACL2024 paper - Adversarial Preference Optimization (APO).
☆57Updated last year
Xuekai-Zhu / key-configuration-of-llms
☆23Updated last year
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆149Updated 9 months ago
louieworth / awesome-rlhf
An index of algorithms for reinforcement learning from human feedback (rlhf))
☆92Updated last year
YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆327Updated last year
LeapLabTHU / ExpeL
☆180Updated 11 months ago
ziyuwan / ReMA-public
Reinforced Multi-LLM Agents training
☆58Updated 5 months ago
CJReinforce / PURE
Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
☆141Updated 3 weeks ago
WooooDyy / LLM-Reverse-Curriculum-RL
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…
☆112Updated last year
BAAI-Agents / GPA-LM
This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Met…
☆158Updated last year
sanowl / Self-Correcting-LLM--Reinforcement-Learning-
This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…
☆37Updated 4 months ago
1989Ryan / llm-mcts
[NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling bett…
☆290Updated last year
YangRui2015 / Generalizable-Reward-Model
Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"
☆42Updated 9 months ago
NJU-RL / GLIDER
[ICML 2025] Official Implementation of GLIDER
☆66Updated last month
minghchen / automanual
Code for NeurIPS 2024 paper "AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning"
☆49Updated last year
szxiangjn / world-model-for-language-model
☆132Updated last year
alecwangcq / f-divergence-dpo
Direct preference optimization with f-divergences.
☆15Updated last year
microsoft / SmartPlay
SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …
☆143Updated last year
PKU-Alignment / AlignmentSurvey
AI Alignment: A Comprehensive Survey
☆136Updated 2 years ago
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆176Updated 2 years ago