Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
☆42Jul 28, 2024Updated last year
Alternatives and similar repositories for BiPO
Users that are interested in BiPO are comparing it to the libraries listed below
Sorting:
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆26Jun 27, 2024Updated last year
- A resource repository for representation engineering in large language models☆148Nov 14, 2024Updated last year
- Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"☆14Nov 22, 2024Updated last year
- ☆16Sep 1, 2025Updated 6 months ago
- Steering Llama 2 with Contrastive Activation Addition☆213May 23, 2024Updated last year
- ☆15Jun 11, 2025Updated 8 months ago
- [ICLR 2025] General-purpose activation steering library☆145Sep 18, 2025Updated 5 months ago
- This is the official Gtihub repo for our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Lang…☆22Jul 3, 2024Updated last year
- ☆19Mar 5, 2024Updated 2 years ago
- Camouflage poisoning via machine unlearning☆19Jul 3, 2025Updated 8 months ago
- ☆23Jun 13, 2024Updated last year
- Algebraic value editing in pretrained language models☆69Nov 1, 2023Updated 2 years ago
- [ICML 2024] Language Models Represent Beliefs of Self and Others☆35Sep 26, 2024Updated last year
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆198Feb 13, 2025Updated last year
- ☆37Aug 28, 2025Updated 6 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆140Feb 21, 2025Updated last year
- [COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆54Apr 6, 2025Updated 11 months ago
- Offical code of the paper Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Le…☆75Mar 20, 2024Updated last year
- Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering☆103Nov 23, 2024Updated last year
- ☆37Dec 19, 2024Updated last year
- [NeurIPS'23] Binary Classification with Confidence Difference☆10May 13, 2024Updated last year
- MemRec☆37Jan 16, 2026Updated last month
- ☆11Mar 31, 2022Updated 3 years ago
- Code for "Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders" at ICML 2024☆10Sep 18, 2025Updated 5 months ago
- ☆10Nov 6, 2024Updated last year
- ☆10Mar 24, 2023Updated 2 years ago
- [ICML 2023] Protecting Language Generation Models via Invisible Watermarking☆13Sep 8, 2023Updated 2 years ago
- ☆15Feb 2, 2026Updated last month
- 登录脚本☆12Nov 4, 2022Updated 3 years ago
- The guideline for pod.☆10Jun 19, 2020Updated 5 years ago
- ☆10Jul 5, 2023Updated 2 years ago
- The PackNet Continual Learning Method in Pytorch☆15Aug 19, 2021Updated 4 years ago
- ☆10Jul 4, 2024Updated last year
- Github Repo for ICML 2022 paper: Communication-Efficient Adaptive Federated Learning☆10Nov 18, 2022Updated 3 years ago
- This repository includes the implementation and results of the paper "ChatGPT is fun, but it is not funny! Humor is still challenging Lar…☆13Jul 13, 2023Updated 2 years ago
- Shadow Attack, LiRA, Quantile Regression and RMIA implementations in PyTorch (Online version)☆14Nov 8, 2024Updated last year
- [JMLR] Gradual Domain Adaptation: Theory and Algorithms☆11Jan 14, 2025Updated last year
- Towards Better Graph Representation Learning with Parameterized Decomposition & Filtering☆13Aug 22, 2023Updated 2 years ago
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model☆572Jan 28, 2025Updated last year