yihuaihong/ConceptVectors

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yihuaihong/ConceptVectors)

yihuaihong / ConceptVectors

[EMNLP 2025 Main] ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"

☆40

Alternatives and similar repositories for ConceptVectors

Users that are interested in ConceptVectors are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yihuaihong / Dissecting-FT-Unlearning
View on GitHub
[EMNLP 2024 Main] Code for the paper "Dissecting Fine-Tuning Unlearning in Large Language Models"
☆14Oct 10, 2024Updated last year
Carol-gutianle / MEOW
View on GitHub
☆16May 16, 2025Updated last year
google / belief-localization
View on GitHub
This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…
☆62May 9, 2023Updated 3 years ago
UCSB-NLP-Chang / causal_unlearn
View on GitHub
[EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"
☆35Jul 22, 2024Updated 2 years ago
SALT-NLP / Efficient_Unlearning
View on GitHub
☆38Oct 18, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
OPTML-Group / Unlearn-WorstCase
View on GitHub
[ECCV24] "Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning" by Chongyu Fan*, Jiancheng Liu*, Alfred Hero, …
☆28May 27, 2025Updated last year
tmlr-group / G-effect
View on GitHub
[ICLR 2025] "Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond"
☆16Feb 27, 2025Updated last year
chrisliu298 / llm-unlearn-eco
View on GitHub
[NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts
☆41Sep 26, 2024Updated last year
technion-cs-nlp / parametric-faithfulness
View on GitHub
☆23Aug 30, 2025Updated 10 months ago
shash42 / Evaluating-Inexact-Unlearning
View on GitHub
☆12Aug 8, 2023Updated 2 years ago
thunlp / EREN
View on GitHub
Official codes for COLING 2024 paper "Robust and Scalable Model Editing for Large Language Models": https://arxiv.org/abs/2403.17431v1
☆14Mar 27, 2024Updated 2 years ago
kaistAI / knowledge-reasoning
View on GitHub
[EMNLP 2024] Official implementation of "Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Ut…
☆23Dec 4, 2024Updated last year
mohsenfayyaz / GlobEnc
View on GitHub
[NAACL 2022] GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers
☆21May 16, 2023Updated 3 years ago
CharlesYu2000 / PCGU-UnlearningBias
View on GitHub
☆17Nov 7, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
OPTML-Group / SOUL
View on GitHub
Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"
☆30Oct 1, 2024Updated last year
GraySwanAI / circuit-breakers
View on GitHub
Improving Alignment and Robustness with Circuit Breakers
☆266Sep 24, 2024Updated last year
visinf / fast-axiomatic-attribution
View on GitHub
Fast Axiomatic Attribution for Neural Networks (NeurIPS*2021)
☆15Feb 24, 2026Updated 5 months ago
chrisliu298 / awesome-llm-unlearning
View on GitHub
A resource repository for machine unlearning in large language models
☆618Updated this week
allenai / few_shot_explanations
View on GitHub
Code for NAACL 2022 paper "Reframing Human-AI Collaboration for Generating Free-Text Explanations"
☆29Apr 28, 2023Updated 3 years ago
jinzhuoran / RWKU
View on GitHub
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
☆100Sep 30, 2024Updated last year
zleizzo / datadeletion
View on GitHub
☆13Feb 24, 2020Updated 6 years ago
zouharvi / subset2evaluate
View on GitHub
Find informative examples to efficiently (human)-evaluate NLG models.
☆17Apr 22, 2026Updated 3 months ago
mt-upc / transformer-contributions-nmt
View on GitHub
☆18Oct 6, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
mt-upc / transformer-contributions
View on GitHub
Measuring the Mixing of Contextual Information in the Transformer
☆35May 27, 2023Updated 3 years ago
ekinakyurek / influence
View on GitHub
Code for "Tracing Knowledge in Language Models Back to the Training Data"
☆40Dec 27, 2022Updated 3 years ago
allenai / label_rationale_association
View on GitHub
Code for EMNLP 2021 paper "Measuring Association Between Labels and Free-Text Rationales"
☆12Sep 12, 2023Updated 2 years ago
crazyofapple / AT-BMC
View on GitHub
AAAI 2022 paper - Unifying Model Explainability and Robustness for Joint Text Classification and Rationale Extraction
☆17Dec 23, 2021Updated 4 years ago
kztakemoto / simbaja
View on GitHub
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
☆17Apr 24, 2024Updated 2 years ago
franciscoliu / SKU
View on GitHub
Official code implementation of SKU, Accepted by ACL 2024 Findings
☆20Dec 18, 2024Updated last year
franciscoliu / Awesome-GenAI-Unlearning
View on GitHub
☆188Apr 22, 2026Updated 3 months ago
LiuAmber / RAHF
View on GitHub
[ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…
☆28Sep 25, 2024Updated last year
peterbhase / LAS-NL-Explanations
View on GitHub
Code for paper "Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?"
☆21Oct 13, 2020Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
mitvis / saliency-cards
View on GitHub
Saliency Cards are transparency documentation for saliency methods. Learn about new saliency methods or document your own!
☆19Jun 9, 2023Updated 3 years ago
mohsenfayyaz / DecompX
View on GitHub
DecompX: Explaining Transformers Decisions by Propagating Token Decomposition [ACL 2023]
☆19Jul 3, 2025Updated last year
INK-USC / expl-refinement
View on GitHub
Code for the paper "Refining Language Model with Compositional Explanation" (NeurIPS 2021)
☆11Oct 25, 2021Updated 4 years ago
Betswish / MIRAGE
View on GitHub
Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/
☆25Mar 10, 2025Updated last year
Model-GLUE / Model-GLUE
View on GitHub
☆18Aug 19, 2024Updated last year
zhaoyiran924 / Safety-Neuron
View on GitHub
[ICLR 2025] Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
☆33Apr 30, 2025Updated last year
OPTML-Group / WAGLE
View on GitHub
Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"
☆19Dec 16, 2024Updated last year