ericwtodd/function_vectors

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ericwtodd/function_vectors)

ericwtodd / function_vectors

Function Vectors in Large Language Models (ICLR 2024)

☆199

Alternatives and similar repositories for function_vectors

Users that are interested in function_vectors are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

roeehendel / icl_task_vectors
View on GitHub
☆106Oct 30, 2023Updated 2 years ago
HITsz-TMG / ICL-State-Vector
View on GitHub
☆12Jul 4, 2024Updated 2 years ago
davidbau / baukit
View on GitHub
☆256Feb 22, 2024Updated 2 years ago
shengliu66 / ICV
View on GitHub
Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
☆201Feb 13, 2025Updated last year
jiahai-feng / binding-iclr
View on GitHub
☆19Mar 5, 2024Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Lslland / T-Vaccine
View on GitHub
☆19Jun 21, 2025Updated last year
ndif-team / nnsight
View on GitHub
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆995Updated this week
google / belief-localization
View on GitHub
This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…
☆62May 9, 2023Updated 3 years ago
Brandon3964 / MultiModal-Task-Vector
View on GitHub
[NeurIPS 2024] Official Code for the Paper "Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning"
☆27Apr 8, 2025Updated last year
zhliu0106 / learning-to-refuse
View on GitHub
Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"
☆10Dec 13, 2024Updated last year
ethz-spylab / unlearning-vs-safety
View on GitHub
☆27Oct 6, 2024Updated last year
OPTML-Group / Unlearn-Simple
View on GitHub
[NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"
☆45Oct 3, 2025Updated 9 months ago
KihoPark / linear_rep_geometry
View on GitHub
Code for 'The Linear Representation Hypothesis and the Geometry of Large Language Models' (ICML 2024)
☆125Feb 11, 2025Updated last year
EleutherAI / mdl
View on GitHub
Minimum Description Length probing for neural network representations
☆20Jan 28, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
hannamw / MIB-circuit-track
View on GitHub
☆24Jun 30, 2025Updated last year
nrimsky / LM-exp
View on GitHub
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
☆104Sep 21, 2023Updated 2 years ago
nrimsky / CAA
View on GitHub
Steering Llama 2 with Contrastive Activation Addition
☆240May 23, 2024Updated 2 years ago
UKPLab / on-emergence
View on GitHub
Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning
☆33Jan 9, 2025Updated last year
OPTML-Group / WAGLE
View on GitHub
Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"
☆19Dec 16, 2024Updated last year
y0mingzhang / diffuse-distributions
View on GitHub
Forcing Diffuse Distributions out of Language Models
☆18Sep 10, 2024Updated last year
ArthurConmy / Automatic-Circuit-Discovery
View on GitHub
☆293Oct 1, 2024Updated last year
hannamw / eap-ig-faithfulness
View on GitHub
Code for "Automatic Circuit Finding and Faithfulness"
☆18Jul 11, 2024Updated 2 years ago
HoagyC / sparse_coding
View on GitHub
Using sparse coding to find distributed representations used by neural networks.
☆306Nov 10, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
dchiji-ntt / iterand
View on GitHub
Official implementation for "Pruning Randomly Initialized Neural Networks with Iterative Randomization"
☆10Oct 5, 2021Updated 4 years ago
EnnengYang / Efficient-WEMoE
View on GitHub
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.
☆16Oct 28, 2024Updated last year
koayon / atp_star
View on GitHub
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Jan 19, 2025Updated last year
TransformerLensOrg / TransformerLens
View on GitHub
A library for mechanistic interpretability of GPT-style language models
☆3,695Updated this week
neelnanda-io / Neuroscope
View on GitHub
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
☆14Feb 13, 2023Updated 3 years ago
HelloEveryboby / Butler
View on GitHub
Butler 是一个用于自动化服务管理和任务调度的工具项目。
☆17Updated this week
abhishekpanigrahi1996 / Skill-Localization-by-grafting
View on GitHub
☆52Jan 1, 2024Updated 2 years ago
callummcdougall / path_patching
View on GitHub
Implementation of path patching & activation patching (will eventually add to TransformerLens).
☆15Jan 8, 2024Updated 2 years ago
evandez / relations
View on GitHub
How do transformer LMs encode relations?
☆59Feb 24, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
saprmarks / feature-circuits
View on GitHub
☆223Oct 14, 2025Updated 9 months ago
EleutherAI / concept-erasure
View on GitHub
Erasing concepts from neural representations with provable guarantees
☆258Jan 27, 2025Updated last year
Dakingrai / awesome-mechanistic-interpretability-lm-papers
View on GitHub
☆259Nov 22, 2024Updated last year
explanare / ravel
View on GitHub
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆58Oct 30, 2025Updated 8 months ago
likenneth / honest_llama
View on GitHub
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
☆581Jan 28, 2025Updated last year
aadityasingh / icl-dynamics
View on GitHub
☆26Feb 20, 2026Updated 5 months ago
EleutherAI / delphi
View on GitHub
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆266Updated this week