Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"
☆211Jul 31, 2023Updated 2 years ago
Alternatives and similar repositories for Implicit-Language-Q-Learning
Users that are interested in Implicit-Language-Q-Learning are comparing it to the libraries listed below
Sorting:
- Official code for the paper "Context-Aware Language Modeling for Goal-Oriented Dialogue Systems"☆34Dec 9, 2022Updated 3 years ago
- A modular RL library to fine-tune language models to human preferences☆2,378Mar 1, 2024Updated 2 years ago
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,741Jan 8, 2024Updated 2 years ago
- ☆26May 30, 2023Updated 2 years ago
- Official Implementation of NeurIPS'23 Paper "Cross-Episodic Curriculum for Transformer Agents"☆31Oct 12, 2023Updated 2 years ago
- Official code for "Can Wikipedia Help Offline Reinforcement Learning?" by Machel Reid, Yutaro Yamada and Shixiang Shane Gu☆106Jul 18, 2022Updated 3 years ago
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆25Mar 7, 2024Updated last year
- Generalised UDRL☆37May 12, 2022Updated 3 years ago
- Official code repo for paper: Hybrid RL: Using both offline and online data can make RL efficient.☆25Feb 16, 2023Updated 3 years ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,818Jun 17, 2025Updated 8 months ago
- Code for the paper Fine-Tuning Language Models from Human Preferences☆1,378Jul 25, 2023Updated 2 years ago
- Few-shot Learning with Auxiliary Data☆31Dec 8, 2023Updated 2 years ago
- Extreme Q-Learning: Max Entropy RL without Entropy☆87Feb 14, 2023Updated 3 years ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆241May 5, 2024Updated last year
- Scaling Data-Constrained Language Models☆342Jun 28, 2025Updated 8 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆107Sep 23, 2023Updated 2 years ago
- RewardBench: the first evaluation tool for reward models.☆697Feb 16, 2026Updated 2 weeks ago
- ☆105Oct 30, 2023Updated 2 years ago
- contrastive decoding☆207Nov 14, 2022Updated 3 years ago
- A web based platform for collecting human actions in reinforcement learning environments☆31Sep 10, 2025Updated 5 months ago
- This is the official code for the paper CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (Neur…☆558Jan 21, 2025Updated last year
- Constrained Decoding Project☆20Nov 10, 2023Updated 2 years ago
- DSIR large-scale data selection framework for language model training☆270Apr 7, 2024Updated last year
- Used for adaptive human in the loop evaluation of language and embedding models.☆308Mar 1, 2023Updated 3 years ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆186May 25, 2025Updated 9 months ago
- A (somewhat) minimal library for finetuning language models with PPO on human feedback.☆90Nov 23, 2022Updated 3 years ago
- Generalized Decision Transformer for Offline Hindsight Information Matching (ICLR2022)☆70Aug 8, 2022Updated 3 years ago
- ☆385Feb 13, 2023Updated 3 years ago
- Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.☆2,773Apr 29, 2024Updated last year
- Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).☆244Dec 11, 2025Updated 2 months ago
- This is the pytorch implementation of the UAI2023 paper "A Trajectory is Worth Three Sentences: Multimodal Transformer for Offline Reinf…☆11Oct 9, 2023Updated 2 years ago
- ☆15Oct 4, 2024Updated last year
- Code accompanying our papers on the "Generative Distributional Control" framework☆118Dec 7, 2022Updated 3 years ago
- ☆16Jul 16, 2024Updated last year
- [NIPS2023] RRHF & Wombat☆809Sep 22, 2023Updated 2 years ago
- A python module designed for agile RL algorithm developing.☆26Jul 11, 2024Updated last year
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Jul 12, 2023Updated 2 years ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆282Jul 11, 2024Updated last year
- Tasks for describing differences between text distributions.☆17Aug 9, 2024Updated last year