OATML / semantic-entropy-probesLinks

☆47

Alternatives and similar repositories for semantic-entropy-probes

Users that are interested in semantic-entropy-probes are comparing it to the libraries listed below

Sorting:

zlin7 / UQ-NLG
☆102Updated last year
balevinstein / Probes
☆57Updated 2 years ago
activatedgeek / calibration-tuning
☆52Updated 7 months ago
stanfordnlp / axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆150Updated 5 months ago
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆67Updated last year
lorenzkuhn / semantic_uncertainty
☆180Updated last year
MiaoXiong2320 / llm-uncertainty
code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"
☆137Updated last year
Thartvigsen / GRACE
[NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
☆82Updated 11 months ago
IBM / activation-steering
[ICLR 2025] General-purpose activation steering library
☆123Updated 2 months ago
saprmarks / geometry-of-truth
☆95Updated last year
ucl-dark / llm_debate
Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"
☆122Updated last year
logix-project / logix
AI Logging for Interpretability and Explainability🔬
☆133Updated last year
UCSB-NLP-Chang / llm_uncertainty
☆40Updated last year
ajyl / dpo_toxic
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆84Updated 8 months ago
explanare / ravel
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆57Updated last month
nrimsky / CAA
Steering Llama 2 with Contrastive Activation Addition
☆195Updated last year
ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
☆186Updated 7 months ago
fc2869 / lo-fit
LoFiT: Localized Fine-tuning on LLM Representations
☆45Updated 10 months ago
MaheepChaudhary / SAE-Ravel
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆12Updated 10 months ago
roeehendel / icl_task_vectors
☆101Updated 2 years ago
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆123Updated last year
milesaturpin / cot-unfaithfulness
☆51Updated 2 years ago
dannyallover / overthinking_the_truth
☆29Updated last year
paul-rottger / xstest
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆116Updated 9 months ago
abertsch72 / long-context-icl
Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"
☆40Updated last year
epfl-dlab / llm-latent-language
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
☆80Updated last year
deeplearning-wisc / args
☆46Updated last year
javiferran / sae_entities
☆66Updated 8 months ago
LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆71Updated last year
jinhaoduan / SAR
[ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
☆59Updated last year