yihuaihong / Dissecting-FT-UnlearningLinks
☆13Updated 8 months ago
Alternatives and similar repositories for Dissecting-FT-Unlearning
Users that are interested in Dissecting-FT-Unlearning are comparing it to the libraries listed below
Sorting:
- Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆28Updated 2 months ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆59Updated 8 months ago
- ☆22Updated 3 months ago
- ☆44Updated 3 months ago
- ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆35Updated 4 months ago
- awesome SAE papers☆35Updated last month
- Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"☆26Updated 8 months ago
- A curated list of resources for activation engineering☆90Updated last month
- ☆59Updated 11 months ago
- ☆41Updated 8 months ago
- Implementation code for ACL2024:Advancing Parameter Efficiency in Fine-tuning via Representation Editing☆14Updated last year
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆59Updated last year
- LoFiT: Localized Fine-tuning on LLM Representations☆39Updated 5 months ago
- "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning" by Chongyu Fan*, Jiancheng Liu*, Licong Lin*, Jingh…☆26Updated this week
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆32Updated 4 months ago
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆44Updated 7 months ago
- This repo is for the safety topic, including attacks, defenses and studies related to reasoning and RL☆21Updated this week
- ☆16Updated last week
- [ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…☆25Updated 9 months ago
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆76Updated 8 months ago
- This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturba…☆28Updated 3 months ago
- ☆19Updated 4 months ago
- ☆23Updated 10 months ago
- ☆28Updated last year
- Code and data repository for "The Mirage of Model Editing: Revisiting Evaluation in the Wild"☆14Updated 3 weeks ago
- A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…☆89Updated 4 months ago
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"☆64Updated 6 months ago
- ☆26Updated last year
- A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…☆89Updated last week
- [ACL 25] SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities☆17Updated 2 months ago