Ybakman / TruthTorchLMLinks
☆53Updated last month
Alternatives and similar repositories for TruthTorchLM
Users that are interested in TruthTorchLM are comparing it to the libraries listed below
Sorting:
- This repository collects all relevant resources about interpretability in LLMs☆375Updated 11 months ago
- [NeurIPS D&B '25] The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning metho…☆398Updated 3 weeks ago
- ☆206Updated 11 months ago
- Using sparse coding to find distributed representations used by neural networks.☆280Updated last year
- ☆131Updated last week
- AI Logging for Interpretability and Explainability🔬☆129Updated last year
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).☆380Updated last year
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆76Updated last year
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆277Updated 7 months ago
- ☆181Updated 11 months ago
- ☆371Updated this week
- ☆179Updated last year
- Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization☆33Updated last year
- A resource repository for representation engineering in large language models☆139Updated 11 months ago
- A resource repository for machine unlearning in large language models☆498Updated 3 months ago
- Python package for measuring memorization in LLMs.☆170Updated 3 months ago
- [ICLR 2025] General-purpose activation steering library☆114Updated last month
- ☆191Updated 2 weeks ago
- A fast, effective data attribution method for neural networks in PyTorch☆220Updated 11 months ago
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvature☆166Updated 4 months ago
- A Survey on Data Selection for Language Models☆250Updated 6 months ago
- Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.☆572Updated last week
- Steering Llama 2 with Contrastive Activation Addition☆191Updated last year
- ☆57Updated 2 years ago
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]☆195Updated 3 months ago
- ☆247Updated last year
- ☆63Updated 7 months ago
- ☆240Updated last year
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models☆269Updated 2 months ago
- ☆98Updated last year