LoryPack/LLM-LieDetector

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LoryPack/LLM-LieDetector)

LoryPack / LLM-LieDetector

Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"

☆74

Alternatives and similar repositories for LLM-LieDetector

Users that are interested in LLM-LieDetector are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

KoyenaPal / future-lens
View on GitHub
Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
☆21Oct 24, 2025Updated 8 months ago
milesaturpin / cot-unfaithfulness
View on GitHub
☆57Oct 23, 2023Updated 2 years ago
EleutherAI / features-across-time
View on GitHub
Understanding how features learned by neural networks evolve throughout training
☆41Oct 24, 2024Updated last year
EleutherAI / elk
View on GitHub
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆221Updated this week
choidami / inductive-oocr
View on GitHub
☆16Mar 22, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
CarperAI / decontamination
View on GitHub
This repository contains code for cleaning your training data of benchmark data to help combat data snooping.
☆28Apr 21, 2023Updated 3 years ago
tommasoc80 / AbuseEval
View on GitHub
Data set for LREC 2020 paper "I Feel Offended, Don't Be Abusive!"
☆19Sep 23, 2023Updated 2 years ago
EleutherAI / elk-generalization
View on GitHub
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆33May 23, 2024Updated 2 years ago
FlyingPumba / InterpBench
View on GitHub
A benchmark for mechanistic discovery of circuits in Transformers
☆17Dec 15, 2024Updated last year
UKPLab / on-emergence
View on GitHub
Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning
☆33Jan 9, 2025Updated last year
hijohnnylin / neuronpedia-scorer
View on GitHub
☆17Feb 14, 2024Updated 2 years ago
EleutherAI / mdl
View on GitHub
Minimum Description Length probing for neural network representations
☆20Jan 28, 2025Updated last year
LanD-FBK / benchmark-gen-explanations
View on GitHub
Codes for "Benchmarking the Generation of Fact Checking Explanations"
☆10Aug 16, 2024Updated last year
AlignmentResearch / tuned-lens
View on GitHub
Tools for understanding how transformer predictions are built layer-by-layer
☆604Aug 7, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
fakenewsresearch / dataset
View on GitHub
☆26Jun 14, 2024Updated 2 years ago
mlepori1 / NeuroSurgeon
View on GitHub
NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers
☆43Feb 12, 2025Updated last year
thestephencasper / latent_adversarial_training
View on GitHub
☆24Jul 25, 2024Updated last year
collin-burns / discovering_latent_knowledge
View on GitHub
☆287Mar 2, 2024Updated 2 years ago
aengusl / latent-adversarial-training
View on GitHub
☆48Sep 29, 2024Updated last year
AlignmentResearch / learned-planner
View on GitHub
Interpreting Learned Search and Planning: Reverse-engineering recurrent convolutional networks (DRC) that play Sokoban
☆21Jun 29, 2025Updated last year
XingqiaoWang / DeepCausalPV-master
View on GitHub
☆18May 18, 2021Updated 5 years ago
EleutherAI / pile_dedupe
View on GitHub
Pile Deduplication Code
☆18May 15, 2023Updated 3 years ago
samuelarnesen / nyu-debate-modeling
View on GitHub
☆25Oct 4, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
moqingyan / dsr-lm
View on GitHub
☆13Jul 8, 2023Updated 3 years ago
nostalgebraist / transformer-utils
View on GitHub
Utilities for the HuggingFace transformers library
☆77Jan 21, 2023Updated 3 years ago
bertiev / SimpleSafetyTests
View on GitHub
☆19Mar 25, 2024Updated 2 years ago
LCS2-IIITD / MSH-COMICS
View on GitHub
Multi-modal Sarcasm Detection and Humor Classification in Code-mixed Conversations
☆13May 31, 2021Updated 5 years ago
allenai / easy-to-hard-generalization
View on GitHub
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Jan 17, 2024Updated 2 years ago
ustcljb / topK-off-policy-correction-REINFORCE
View on GitHub
☆19Oct 7, 2020Updated 5 years ago
mwalmer-umd / vit_analysis
View on GitHub
☆35Jun 13, 2023Updated 3 years ago
saccharomycetes / visual_crop_zsvqa
View on GitHub
☆12Apr 10, 2024Updated 2 years ago
explanare / ravel
View on GitHub
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆58Oct 30, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
anthropics / PySvelte
View on GitHub
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
☆228Dec 22, 2021Updated 4 years ago
subhashk01 / LLM-addition
View on GitHub
LLMs represent numbers on a helix and manipulate that helix to do addition.
☆31Feb 4, 2025Updated last year
qiuhuachuan / latent-jailbreak
View on GitHub
☆39May 21, 2024Updated 2 years ago
EleutherAI / concept-erasure
View on GitHub
Erasing concepts from neural representations with provable guarantees
☆258Jan 27, 2025Updated last year
aigeek0x0 / radiantloom-email-assist-7b
View on GitHub
Radiantloom Email Assist 7B is an email-assistant large language model fine-tuned from Zephyr-7B-Beta, over a custom-curated dataset of 1…
☆14Jan 19, 2024Updated 2 years ago
junyachen / Data-examples
View on GitHub
☆19Jun 15, 2024Updated 2 years ago
nrimsky / LM-exp
View on GitHub
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
☆104Sep 21, 2023Updated 2 years ago