haoliuhl / chain-of-hindsight
Simple next-token-prediction for RLHF
☆222Updated last year
Alternatives and similar repositories for chain-of-hindsight:
Users that are interested in chain-of-hindsight are comparing it to the libraries listed below
- ☆160Updated last year
- Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467☆279Updated last month
- Self-Alignment with Principle-Following Reward Models☆156Updated last year
- ☆172Updated last year
- ☆271Updated 2 months ago
- ☆231Updated 2 years ago
- DSIR large-scale data selection framework for language model training☆242Updated 11 months ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆157Updated 10 months ago
- Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks☆208Updated last year
- Scaling Data-Constrained Language Models☆334Updated 5 months ago
- Code accompanying the paper Pretraining Language Models with Human Preferences☆181Updated last year
- RewardBench: the first evaluation tool for reward models.☆521Updated 2 weeks ago
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆218Updated last year
- A repository for transformer critique learning and generation☆88Updated last year
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆150Updated last year
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆233Updated last year
- Reasoning with Language Model is Planning with World Model☆160Updated last year
- Data and code for the ICLR 2023 paper "Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning".☆149Updated last year
- ☆120Updated 4 months ago
- ☆178Updated 2 years ago
- Data and code for "DocPrompting: Generating Code by Retrieving the Docs" @ICLR 2023☆241Updated last year
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback☆205Updated last year
- ☆268Updated last year
- All available datasets for Instruction Tuning of Large Language Models☆247Updated last year
- A large-scale, fine-grained, diverse preference dataset (and models).☆331Updated last year
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆115Updated 9 months ago
- Accepted by Transactions on Machine Learning Research (TMLR)☆126Updated 5 months ago
- [EMNLP 2023] Adapting Language Models to Compress Long Contexts☆294Updated 6 months ago
- ☆96Updated last year
- RLHF implementation details of OAI's 2019 codebase☆183Updated last year