code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆51Nov 17, 2024Updated last year
Alternatives and similar repositories for neuron-attribution
Users that are interested in neuron-attribution are comparing it to the libraries listed below
Sorting:
- ☆23Dec 17, 2024Updated last year
- Official implementation of "MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model". Our co…☆25Dec 20, 2024Updated last year
- This is a repository for paper titled, PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Plann…☆14Nov 3, 2023Updated 2 years ago
- ☆12Apr 25, 2024Updated last year
- Codebase for Hyperdecoders https://arxiv.org/abs/2203.08304☆14Oct 11, 2022Updated 3 years ago
- UECA-Prompt: Universal Prompt for Emotion Cause Analysis(COLING 2022)☆14Jun 6, 2023Updated 2 years ago
- ☆19Mar 25, 2025Updated 11 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆294Jan 22, 2026Updated last month
- [ICLR 2025 Oral] Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition☆19Nov 25, 2024Updated last year
- ☆24Mar 1, 2025Updated last year
- ☆92Dec 23, 2024Updated last year
- Official repo for the paper "Bilinear MLPs enable weight-based mechanistic interpretability".☆28Aug 2, 2025Updated 7 months ago
- ☆28Nov 16, 2025Updated 3 months ago
- GenRM-CoT: Data release for verification rationales☆68Oct 16, 2024Updated last year
- Official Code for ACL 2023 Outstanding Paper: World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Languag…☆33Oct 20, 2023Updated 2 years ago
- Using sparse coding to find distributed representations used by neural networks.☆297Nov 10, 2023Updated 2 years ago
- Bayesian Low-Rank Adaptation of LLMs: BLoB [NeurIPS 2024] and TFB [NeurIPS 2025]☆34Feb 4, 2026Updated last month
- A simple example for finetuning HuggingFace T5 model. Includes code for intermediate generation.☆26Nov 11, 2020Updated 5 years ago
- ☆33Aug 5, 2023Updated 2 years ago
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆35Jan 31, 2025Updated last year
- ☆42Dec 9, 2024Updated last year
- ☆37Apr 26, 2021Updated 4 years ago
- Code of Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Ne…☆28Mar 19, 2024Updated last year
- Claude Code Template with intelligent task management, specialized agents, and automated workflows for full-stack development☆18Oct 20, 2025Updated 4 months ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆80Apr 12, 2024Updated last year
- The implement of paper:"ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability"☆63Jun 3, 2025Updated 9 months ago
- [EMNLP 2025 Main] ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆39Aug 20, 2025Updated 6 months ago
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆10Feb 13, 2024Updated 2 years ago
- ☆12Jul 4, 2024Updated last year
- 【Every star you give feeds a hungry developer's motivation!⭐️】A Model Context Protocol (MCP) server implementation that provides Google J…☆19Feb 24, 2026Updated last week
- EmotionCircuits-LLM: A complete, reproducible framework for discovering and controlling emotion circuits in large language models.☆25Oct 20, 2025Updated 4 months ago
- ☆21Aug 8, 2025Updated 7 months ago
- [EMNLP2023]: MIRACLE: Towards Personalized Dialogue Generation with Latent-Space Multiple Personal Attribute Control☆12Nov 11, 2023Updated 2 years ago
- ☆14Sep 23, 2024Updated last year
- This repository features a rich collection of optimized prompts for AI applications, focusing on ChatGPT and other conversational agents.…☆12Jan 11, 2025Updated last year
- A Mechanistic‑Interpretability study that finds the structural dynamics of Large Language Models under fine‑tuning.☆16May 30, 2025Updated 9 months ago
- This repository collects all relevant resources about interpretability in LLMs☆390Nov 1, 2024Updated last year
- Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]☆39May 28, 2024Updated last year
- Official repository for EMNLP'24 paper "ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Pertu…☆44Oct 3, 2024Updated last year