haoliuhl / chain-of-hindsightLinks

Simple next-token-prediction for RLHF

☆226

Alternatives and similar repositories for chain-of-hindsight

Users that are interested in chain-of-hindsight are comparing it to the libraries listed below

Sorting:

tianjunz / HIR
☆159Updated 2 years ago
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆168Updated last month
jayelm / gisting
Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467
☆296Updated 8 months ago
anthropics / ConstitutionalHarmlessnessPaper
☆242Updated 2 years ago
tomekkorbak / pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
☆180Updated last year
facebookresearch / Shepherd
This is the repo for the paper Shepherd -- A Critic for Language Model Generation
☆217Updated 2 years ago
CarperAI / autocrit
A repository for transformer critique learning and generation
☆88Updated last year
FranxYao / GPT-Bargaining
Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback
☆207Updated 2 years ago
huggingface / datablations
Scaling Data-Constrained Language Models
☆342Updated 3 months ago
bhargaviparanjape / language-programmes
☆173Updated 2 years ago
veronica320 / Faithful-COT
Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".
☆163Updated last year
allenai / FineGrainedRLHF
☆280Updated 9 months ago
Dahoas / reward-modeling
☆98Updated 2 years ago
p-lambda / dsir
DSIR large-scale data selection framework for language model training
☆261Updated last year
orhonovich / unnatural-instructions
☆179Updated 2 years ago
debjitpaul / refiner
About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate t…
☆70Updated last year
kaistAI / CoT-Collection
[EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
☆246Updated last year
Zyq-scut / RLTF
Accepted by Transactions on Machine Learning Research (TMLR)
☆132Updated last year
sambanova / toolbench
ToolBench, an evaluation suite for LLM tool manipulation capabilities.
☆163Updated last year
QingruZhang / PASTA
PASTA: Post-hoc Attention Steering for LLMs
☆123Updated 10 months ago
orhonovich / instruction-induction
☆67Updated 3 years ago
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆175Updated 2 years ago
raunak-agarwal / instruction-datasets
Datasets for Instruction Tuning of Large Language Models
☆257Updated last year
lupantech / PromptPG
Data and code for the ICLR 2023 paper "Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning".
☆158Updated last year
LAION-AI / Open-Instruction-Generalist
Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks
☆209Updated last year
evandez / REMEDI
Inspecting and Editing Knowledge Representations in Language Models
☆117Updated 2 years ago
xingyaoww / mint-bench
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆130Updated last year
agi-templar / Stable-Alignment
Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Langu…
☆351Updated 2 years ago
yizhongw / Tk-Instruct
Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.
☆181Updated 2 years ago
SALT-NLP / demonstrated-feedback
☆128Updated last year