Persona Vectors: Monitoring and Controlling Character Traits in Language Models
☆367Jul 30, 2025Updated 7 months ago
Alternatives and similar repositories for persona_vectors
Users that are interested in persona_vectors are comparing it to the libraries listed below
Sorting:
- ☆21Jun 22, 2025Updated 8 months ago
- ☆51Jun 26, 2025Updated 8 months ago
- ☆18Apr 7, 2025Updated 11 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆171Updated this week
- ☆266Jan 12, 2026Updated last month
- ☆28Nov 16, 2025Updated 3 months ago
- ☆35Feb 20, 2025Updated last year
- ☆16May 1, 2025Updated 10 months ago
- [NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"☆43Oct 3, 2025Updated 5 months ago
- An active inference model of Lacanian psychoanalysis☆15Jun 7, 2025Updated 9 months ago
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 7 months ago
- MLFlow End to End Workshop at Chandigarh University☆11Feb 3, 2023Updated 3 years ago
- Implementation of Reinforce for educational purposes.☆12Jun 12, 2023Updated 2 years ago
- [NeurIPS D&B '25] The one-stop repository for LLM unlearning☆497Updated this week
- Code Repository for Blog - How to Productionize Large Language Models (LLMs)☆12Mar 27, 2024Updated last year
- This is the implementation for IEEE S&P 2022 paper "Model Orthogonalization: Class Distance Hardening in Neural Networks for Better Secur…☆11Aug 24, 2022Updated 3 years ago
- Deep Learning Type Library☆39Feb 23, 2026Updated 2 weeks ago
- [ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models☆16Jun 18, 2025Updated 8 months ago
- ☆20Jan 5, 2026Updated 2 months ago
- Source code for "Open Cross-Domain Visual Search" (CVIU, 2020).☆14Aug 24, 2020Updated 5 years ago
- Building reliable Retrieval Augmented Generation(RAG) AI Architecture☆13Jul 30, 2024Updated last year
- Focused Papers, Delivered Simply :)☆51Dec 25, 2025Updated 2 months ago
- End-to-end codebase for finetuning LLMs (LLaMA 2, 3, etc.) with or without DP☆16Sep 23, 2024Updated last year
- ☆17Aug 30, 2025Updated 6 months ago
- [USENIX Security 2025] SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks☆20Sep 18, 2025Updated 5 months ago
- ☆20Nov 15, 2024Updated last year
- FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists☆31Aug 14, 2025Updated 6 months ago
- Improving Alignment and Robustness with Circuit Breakers☆258Sep 24, 2024Updated last year
- [NAACL 2025] Towards Rationality in Language and Multimodal Agents: A Survey☆35Feb 19, 2025Updated last year
- Attribution-based Parameter Decomposition☆34Jun 11, 2025Updated 8 months ago
- Röttger et al. (2025): "MSTS: A Multimodal Safety Test Suite for Vision-Language Models"☆16Mar 31, 2025Updated 11 months ago
- ☆70Mar 6, 2025Updated last year
- SampDetox: Black-box Backdoor Defense via Perturbation-based Sample Detoxification☆14Jun 10, 2025Updated 9 months ago
- CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter☆22May 28, 2025Updated 9 months ago
- ☆36May 9, 2025Updated 10 months ago
- [NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in La…☆27Nov 3, 2025Updated 4 months ago
- ☆20May 25, 2024Updated last year
- Official repo for "ProSec: Fortifying Code LLMs with Proactive Security Alignment"☆17Feb 26, 2026Updated last week
- Agent Watch is an AgentOps monitoring library designed for Crew AI applications.☆21Dec 2, 2024Updated last year