google/sycophancy-intervention

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google/sycophancy-intervention)

google / sycophancy-intervention

Scripts for generating synthetic finetuning data for reducing sycophancy.

☆125

Alternatives and similar repositories for sycophancy-intervention

Users that are interested in sycophancy-intervention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

UKPLab / on-emergence
View on GitHub
Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning
☆33Jan 9, 2025Updated last year
GAIR-NLP / BeHonest
View on GitHub
BeHonest: Benchmarking Honesty in Large Language Models
☆35Aug 15, 2024Updated last year
tml-epfl / icl-alignment
View on GitHub
Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]
☆33Jan 23, 2025Updated last year
Nanami18 / Snowballed_Hallucination
View on GitHub
☆43Sep 3, 2024Updated last year
vipulgupta1011 / CALM
View on GitHub
☆11Oct 2, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
GAIR-NLP / OlympicArena
View on GitHub
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆106Mar 6, 2025Updated last year
facebookresearch / Shepherd
View on GitHub
This is the repo for the paper Shepherd -- A Critic for Language Model Generation
☆224Aug 10, 2023Updated 2 years ago
limenlp / safer-instruct
View on GitHub
This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"
☆17Feb 22, 2024Updated 2 years ago
GAIR-NLP / auto-j
View on GitHub
Generative Judge for Evaluating Alignment
☆251Jan 18, 2024Updated 2 years ago
GAIR-NLP / ReAlign
View on GitHub
Reformatted Alignment
☆111Sep 23, 2024Updated last year
likenneth / honest_llama
View on GitHub
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
☆581Jan 28, 2025Updated last year
likenneth / q_probe
View on GitHub
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
☆40Jun 10, 2024Updated 2 years ago
facebookresearch / NeuralMemory
View on GitHub
A Data Source for Reasoning Embodied Agents
☆20Sep 18, 2023Updated 2 years ago
ThrunGroup / maptree
View on GitHub
☆41Sep 25, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
EleutherAI / pile_dedupe
View on GitHub
Pile Deduplication Code
☆18May 15, 2023Updated 3 years ago
kaistAI / FLASK
View on GitHub
[ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
☆218Dec 24, 2023Updated 2 years ago
nlpxucan / evol-instruct
View on GitHub
☆287Apr 25, 2023Updated 3 years ago
automix-llm / automix
View on GitHub
Mixing Language Models with Self-Verification and Meta-Verification
☆116Dec 12, 2024Updated last year
sail-sg / lorahub
View on GitHub
[COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
☆671Jul 22, 2024Updated 2 years ago
maszhongming / ParaKnowTransfer
View on GitHub
Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"
☆33May 9, 2024Updated 2 years ago
pppa2019 / swie_overmiss_llm4mt
View on GitHub
Code for "Improving Translation Faithfulness of Large Language Models via Augmenting Instructions"
☆12Aug 26, 2023Updated 2 years ago
LoryPack / LLM-LieDetector
View on GitHub
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆74Jun 19, 2024Updated 2 years ago
bigcode-project / octopack
View on GitHub
🐙 OctoPack: Instruction Tuning Code Large Language Models
☆479Feb 5, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
eujhwang / personalized-llms
View on GitHub
personalized-llms with allen institute
☆13Jun 22, 2023Updated 3 years ago
IBM / SALMON
View on GitHub
Self-Alignment with Principle-Following Reward Models
☆170Sep 18, 2025Updated 10 months ago
yale-nlp / refdpo
View on GitHub
☆16Jul 23, 2024Updated 2 years ago
wwxu21 / CUT
View on GitHub
Source code of "Reasons to Reject? Aligning Language Models with Judgments"
☆58Feb 29, 2024Updated 2 years ago
NL2Code / CodeM
View on GitHub
☆44Jun 2, 2024Updated 2 years ago
Agora-Lab-AI / The-Distiller
View on GitHub
Generate High Quality textual or multi-modal datasets with Agents
☆18Jun 7, 2023Updated 3 years ago
genglinliu / UnknownBench
View on GitHub
Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge
☆14Feb 20, 2024Updated 2 years ago
GAIR-NLP / alignment-for-honesty
View on GitHub
☆78May 22, 2024Updated 2 years ago
FranxYao / GPT-Bargaining
View on GitHub
Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback
☆207May 24, 2023Updated 3 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
scottlogic-alex / prm800k-denorm
View on GitHub
Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format
☆27Jul 12, 2023Updated 3 years ago
amazon-science / factual-confidence-of-llms
View on GitHub
Code for paper "Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators"
☆17Dec 4, 2024Updated last year
voidism / DoLa
View on GitHub
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
☆557Jul 12, 2026Updated 2 weeks ago
epoch-research / training-cost-trends
View on GitHub
☆27Apr 1, 2026Updated 3 months ago
WangFei-2019 / SNARE
View on GitHub
Project for SNARE benchmark
☆11Jun 5, 2024Updated 2 years ago
felipemaiapolo / tinyBenchmarks
View on GitHub
Evaluating LLMs with fewer examples
☆181Jul 4, 2026Updated 3 weeks ago
UlisseMini / procgen-tools
View on GitHub
Tools for running experiments on RL agents in procgen environments
☆20Apr 5, 2024Updated 2 years ago