Persona Vectors: Monitoring and Controlling Character Traits in Language Models
☆437Apr 22, 2026Updated last month
Alternatives and similar repositories for persona_vectors
Users that are interested in persona_vectors are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆26Jun 22, 2025Updated 11 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆198Mar 12, 2026Updated 3 months ago
- ☆57Jun 26, 2025Updated 11 months ago
- ☆18May 1, 2025Updated last year
- ☆305Jan 12, 2026Updated 5 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [AAAI 2026] This is the official implementation of the paper "ExtendAttack: Attacking Servers of LRMs via Extending Reasoning".☆23Mar 18, 2026Updated 2 months ago
- ☆34Nov 16, 2025Updated 6 months ago
- ☆19Apr 7, 2025Updated last year
- [ICLR 2025] FLAT: LLM Unlearning via Loss Adjustment with Only Forget Data☆14Feb 26, 2025Updated last year
- This is the implementation for IEEE S&P 2022 paper "Model Orthogonalization: Class Distance Hardening in Neural Networks for Better Secur…☆11Aug 24, 2022Updated 3 years ago
- [COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model☆26Nov 25, 2025Updated 6 months ago
- ☆37Feb 20, 2025Updated last year
- ☆38Apr 30, 2024Updated 2 years ago
- Improving Alignment and Robustness with Circuit Breakers☆263Sep 24, 2024Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Erasing conceptual knowledge from language models through low-rank fine-tuning☆23Mar 27, 2025Updated last year
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆58Oct 30, 2025Updated 7 months ago
- ☆31Mar 16, 2025Updated last year
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆403Jun 13, 2025Updated last year
- [NAACL 2025] Towards Rationality in Language and Multimodal Agents: A Survey☆35Feb 19, 2025Updated last year
- [USENIX Security 2025] SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks☆21Sep 18, 2025Updated 8 months ago
- [NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"☆45Oct 3, 2025Updated 8 months ago
- ☆26Jan 5, 2026Updated 5 months ago
- Concept Relevance Propagation for Localization Models, accepted at SAIAD workshop at CVPR 2023.☆15Jan 16, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- NOMU: Neural Optimization-based Model Uncertainty☆10Feb 17, 2023Updated 3 years ago
- Visual Concept Connectome☆15Jun 23, 2024Updated last year
- ☆170May 1, 2026Updated last month
- FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists☆31Aug 14, 2025Updated 10 months ago
- Mapping out the "memory" of neural nets with data attribution☆59Updated this week
- Codes for "Benchmarking the Generation of Fact Checking Explanations"☆10Aug 16, 2024Updated last year
- This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix …☆147Feb 8, 2026Updated 4 months ago
- Official repo for FSE'24 paper "CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking"☆19Mar 10, 2025Updated last year
- ☆56Mar 18, 2026Updated 2 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- [CVPR' 26] MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts☆44Apr 27, 2026Updated last month
- ☆16Dec 29, 2023Updated 2 years ago
- Designing a Dashboard for Transparency and Control of Conversational AI, https://arxiv.org/abs/2406.07882☆39Oct 7, 2025Updated 8 months ago
- ☆10Oct 17, 2022Updated 3 years ago
- A Computational Framework for Behavioral Assessment of LLM Therapists☆39Oct 18, 2024Updated last year
- End-to-end codebase for finetuning LLMs (LLaMA 2, 3, etc.) with or without DP☆18Sep 23, 2024Updated last year
- ☆18Aug 15, 2022Updated 3 years ago