anthropics / ConstitutionalHarmlessnessPaperLinks

☆248

Alternatives and similar repositories for ConstitutionalHarmlessnessPaper

Users that are interested in ConstitutionalHarmlessnessPaper are comparing it to the libraries listed below

Sorting:

haoliuhl / chain-of-hindsight
Simple next-token-prediction for RLHF
☆227Updated 2 years ago
jayelm / gisting
Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467
☆300Updated 9 months ago
tianjunz / HIR
☆159Updated 2 years ago
tomekkorbak / pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
☆180Updated last year
allenai / FineGrainedRLHF
☆281Updated 11 months ago
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆169Updated 2 months ago
glgh / awesome-llm-human-preference-datasets
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
☆384Updated 2 years ago
raunak-agarwal / instruction-datasets
Datasets for Instruction Tuning of Large Language Models
☆259Updated 2 years ago
agi-templar / Stable-Alignment
Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Langu…
☆354Updated 2 years ago
facebookresearch / Shepherd
This is the repo for the paper Shepherd -- A Critic for Language Model Generation
☆219Updated 2 years ago
OpenBMB / UltraFeedback
A large-scale, fine-grained, diverse preference dataset (and models).
☆356Updated last year
evandez / REMEDI
Inspecting and Editing Knowledge Representations in Language Models
☆119Updated 2 years ago
veronica320 / Faithful-COT
Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".
☆165Updated last year
FranxYao / GPT-Bargaining
Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback
☆208Updated 2 years ago
yizhongw / Tk-Instruct
Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.
☆182Updated 3 years ago
LAION-AI / Open-Instruction-Generalist
Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks
☆209Updated last year
xingyaoww / mint-bench
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆134Updated last year
lukasberglund / reversal_curse
☆297Updated 2 years ago
orhonovich / unnatural-instructions
☆180Updated 2 years ago
likenneth / honest_llama
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
☆560Updated 10 months ago
mingkaid / rl-prompt
Accompanying repo for the RLPrompt paper
☆358Updated last year
sambanova / toolbench
ToolBench, an evaluation suite for LLM tool manipulation capabilities.
☆165Updated last year
kaistAI / CoT-Collection
[EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
☆251Updated 2 years ago
meg-tong / sycophancy-eval
datasets from the paper "Towards Understanding Sycophancy in Language Models"
☆97Updated 2 years ago
CarperAI / autocrit
A repository for transformer critique learning and generation
☆89Updated 2 years ago
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆246Updated last year
allenai / reward-bench
RewardBench: the first evaluation tool for reward models.
☆663Updated 5 months ago
suzgunmirac / BIG-Bench-Hard
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
☆533Updated last year
p-lambda / dsir
DSIR large-scale data selection framework for language model training
☆266Updated last year
huggingface / datablations
Scaling Data-Constrained Language Models
☆342Updated 5 months ago