Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"
☆211Jul 31, 2023Updated 2 years ago
Alternatives and similar repositories for Implicit-Language-Q-Learning
Users that are interested in Implicit-Language-Q-Learning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A modular RL library to fine-tune language models to human preferences☆2,390Mar 1, 2024Updated 2 years ago
- Official code for the paper "Context-Aware Language Modeling for Goal-Oriented Dialogue Systems"☆34Dec 9, 2022Updated 3 years ago
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,751Jan 8, 2024Updated 2 years ago
- ☆26May 30, 2023Updated 3 years ago
- Official code repo for paper: Hybrid RL: Using both offline and online data can make RL efficient.☆24Feb 16, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆25Mar 7, 2024Updated 2 years ago
- Official Implementation of NeurIPS'23 Paper "Cross-Episodic Curriculum for Transformer Agents"☆32Oct 12, 2023Updated 2 years ago
- Constrained Decoding Project☆20Nov 10, 2023Updated 2 years ago
- Code for the paper Fine-Tuning Language Models from Human Preferences☆1,394Jul 25, 2023Updated 2 years ago
- Few-shot Learning with Auxiliary Data☆31Dec 8, 2023Updated 2 years ago
- Official code for "Can Wikipedia Help Offline Reinforcement Learning?" by Machel Reid, Yutaro Yamada and Shixiang Shane Gu☆105Jul 18, 2022Updated 3 years ago
- Generalised UDRL☆37May 12, 2022Updated 4 years ago
- A Dual-RL method DVL: Dual-V Learning for offline and online reinforcement learning☆16Oct 22, 2023Updated 2 years ago
- Code accompanying the paper Pretraining Language Models with Human Preferences☆182Feb 13, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆106Oct 30, 2023Updated 2 years ago
- Generalized Decision Transformer for Offline Hindsight Information Matching (ICLR2022)☆70Aug 8, 2022Updated 3 years ago
- Extreme Q-Learning: Max Entropy RL without Entropy☆88Feb 14, 2023Updated 3 years ago
- contrastive decoding☆207Nov 14, 2022Updated 3 years ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"