Sea-Snell / Implicit-Language-Q-Learning
Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"
☆204Updated last year
Alternatives and similar repositories for Implicit-Language-Q-Learning:
Users that are interested in Implicit-Language-Q-Learning are comparing it to the libraries listed below
- ☆79Updated 8 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆126Updated 3 months ago
- ☆160Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆136Updated 11 months ago
- RLHF implementation details of OAI's 2019 codebase☆181Updated last year
- A (somewhat) minimal library for finetuning language models with PPO on human feedback.☆86Updated 2 years ago
- Simple next-token-prediction for RLHF☆222Updated last year
- Code accompanying the paper Pretraining Language Models with Human Preferences☆181Updated last year
- We perform functional grounding of LLMs' knowledge in BabyAI-Text☆245Updated 6 months ago
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.☆244Updated 4 months ago
- Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).☆216Updated 3 months ago
- Code for Contrastive Preference Learning (CPL)☆160Updated 3 months ago
- ☆134Updated 3 months ago
- Super fast implementations of common benchmark text world games☆45Updated 2 months ago
- Repository for the code of the "PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided Decoding" paper, NAACL'22☆64Updated 2 years ago
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆131Updated 10 months ago
- ☆202Updated last year
- [NeurIPS 2023] Learning Transformer Programs☆158Updated 9 months ago
- Benchmarking Agentic LLM and VLM Reasoning On Games☆116Updated this week
- Intrinsic Motivation from Artificial Intelligence Feedback☆128Updated last year
- ☆214Updated last year
- A repository for transformer critique learning and generation☆88Updated last year
- ☆171Updated last year
- CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL☆109Updated 6 months ago
- For experiments involving instruct gpt. Currently used for documenting open research questions.☆71Updated 2 years ago
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆103Updated last year
- ☆129Updated 4 months ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆130Updated 10 months ago
- Interpreting how transformers simulate agents performing RL tasks☆77Updated last year
- ☆95Updated 8 months ago