CaoYuanpu / BiPOView external linksLinks
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
☆42Jul 28, 2024Updated last year
Alternatives and similar repositories for BiPO
Users that are interested in BiPO are comparing it to the libraries listed below
Sorting:
- A resource repository for representation engineering in large language models☆148Nov 14, 2024Updated last year
- ☆16Sep 1, 2025Updated 5 months ago
- ☆15Jun 11, 2025Updated 8 months ago
- ☆16Mar 5, 2024Updated last year
- Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"☆19Dec 14, 2024Updated last year
- This is the official Gtihub repo for our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Lang…☆21Jul 3, 2024Updated last year
- Camouflage poisoning via machine unlearning☆19Jul 3, 2025Updated 7 months ago
- ☆23Jun 13, 2024Updated last year
- [CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.☆29Jul 29, 2024Updated last year
- Algebraic value editing in pretrained language models☆68Nov 1, 2023Updated 2 years ago
- Code release for the paper "Style Vectors for Steering Generative Large Language Models", accepted to the Findings of the EACL 2024.☆36Sep 26, 2024Updated last year
- [ICML 2024] Language Models Represent Beliefs of Self and Others☆35Sep 26, 2024Updated last year
- ☆36Aug 28, 2025Updated 5 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆140Feb 21, 2025Updated 11 months ago
- [COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆52Apr 6, 2025Updated 10 months ago
- ☆37Dec 19, 2024Updated last year
- ☆37Jan 26, 2024Updated 2 years ago
- MemRec☆36Jan 16, 2026Updated 3 weeks ago
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆42Jan 18, 2026Updated 3 weeks ago
- [NeurIPS'23] Binary Classification with Confidence Difference☆10May 13, 2024Updated last year
- ☆11Mar 31, 2022Updated 3 years ago
- Code for the paper "Spectrum Guided Topology Augmentation for Graph Contrastive Learning"☆11Jul 18, 2023Updated 2 years ago
- ☆15Nov 18, 2025Updated 2 months ago
- ☆10Jul 5, 2023Updated 2 years ago
- The PackNet Continual Learning Method in Pytorch☆15Aug 19, 2021Updated 4 years ago
- ☆10Mar 24, 2023Updated 2 years ago
- ☆10Nov 6, 2024Updated last year
- Shadow Attack, LiRA, Quantile Regression and RMIA implementations in PyTorch (Online version)☆14Nov 8, 2024Updated last year
- ☆28Jan 15, 2026Updated 3 weeks ago
- The guideline for pod.☆10Jun 19, 2020Updated 5 years ago
- Towards Better Graph Representation Learning with Parameterized Decomposition & Filtering☆13Aug 22, 2023Updated 2 years ago
- Code for "Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders" at ICML 2024☆10Sep 18, 2025Updated 4 months ago
- This repository includes the implementation and results of the paper "ChatGPT is fun, but it is not funny! Humor is still challenging Lar…☆13Jul 13, 2023Updated 2 years ago
- ☆10Jul 4, 2024Updated last year
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆340Jun 13, 2025Updated 8 months ago
- [ICML 2023] Protecting Language Generation Models via Invisible Watermarking☆13Sep 8, 2023Updated 2 years ago
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆106May 20, 2025Updated 8 months ago
- ☆64Jun 1, 2025Updated 8 months ago
- ☆51May 11, 2025Updated 9 months ago