shengliu66 / ICVLinks

Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering

☆190

Alternatives and similar repositories for ICV

Users that are interested in ICV are comparing it to the libraries listed below

Sorting:

QingruZhang / PASTA
PASTA: Post-hoc Attention Steering for LLMs
☆125Updated 11 months ago
zjunlp / KnowledgeCircuits
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
☆159Updated 8 months ago
ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
☆181Updated 6 months ago
stanfordnlp / axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆136Updated 4 months ago
IBM / activation-steering
[ICLR 2025] General-purpose activation steering library
☆111Updated last month
roeehendel / icl_task_vectors
☆98Updated last year
tianyang-x / SaySelf
Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"
☆109Updated last year
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆121Updated last year
Glaciohound / LM-Steer
Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)
☆125Updated 3 months ago
voidism / Lookback-Lens
Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"
☆135Updated last week
da03 / Internalize_CoT_Step_by_Step
☆195Updated 6 months ago
ucl-dark / llm_debate
Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"
☆117Updated last year
logix-project / logix
AI Logging for Interpretability and Explainability🔬
☆129Updated last year
nrimsky / CAA
Steering Llama 2 with Contrastive Activation Addition
☆188Updated last year
prateeky2806 / ties-merging
☆194Updated last year
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆125Updated 8 months ago
yueyu1030 / AttrPrompt
[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.
☆153Updated last year
fc2869 / lo-fit
LoFiT: Localized Fine-tuning on LLM Representations
☆41Updated 9 months ago
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆66Updated 11 months ago
peterljq / Parsimonious-Concept-Engineering
PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)
☆40Updated 11 months ago
tonychenxyz / selfie
This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…
☆52Updated 10 months ago
declare-lab / trust-align
Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…
☆68Updated 7 months ago
lyy1994 / awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
☆101Updated last month
MingLiiii / Layer_Gradient
[ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
☆75Updated 4 months ago
saprmarks / geometry-of-truth
☆92Updated last year
activatedgeek / calibration-tuning
☆52Updated 6 months ago
ScalerLab / JudgeBench
☆102Updated 11 months ago
jlko / long_hallucinations
Codebase for reproducing the experiments of the semantic uncertainty paper (paragraph-length experiments).
☆70Updated last year
LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆71Updated last year
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆242Updated 11 months ago