Ybakman / TruthTorchLMLinks
β46Updated last week
Alternatives and similar repositories for TruthTorchLM
Users that are interested in TruthTorchLM are comparing it to the libraries listed below
Sorting:
- This repository collects all relevant resources about interpretability in LLMsβ368Updated 9 months ago
- AI Logging for Interpretability and Explainabilityπ¬β125Updated last year
- Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.β497Updated this week
- A resource repository for representation engineering in large language modelsβ129Updated 8 months ago
- β180Updated 8 months ago
- β97Updated last year
- [ICLR 2025] General-purpose activation steering libraryβ88Updated 2 weeks ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)β73Updated 10 months ago
- β172Updated last year
- Conformal Language Modelingβ32Updated last year
- The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods. All features: β¦β341Updated 3 weeks ago
- A Survey on Data Selection for Language Modelsβ246Updated 3 months ago
- A fast, effective data attribution method for neural networks in PyTorchβ215Updated 8 months ago
- LLM-Merging: Building LLMs Efficiently through Mergingβ202Updated 10 months ago
- Using sparse coding to find distributed representations used by neural networks.β261Updated last year
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steeringβ182Updated 5 months ago
- β96Updated last year
- β235Updated last year
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..β263Updated 4 months ago
- β321Updated this week
- β157Updated 8 months ago
- β60Updated 5 months ago
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).β352Updated last year
- Steering Llama 2 with Contrastive Activation Additionβ170Updated last year
- β185Updated last year
- Editing Models with Task Arithmeticβ490Updated last year
- Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimizationβ30Updated last year
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".β103Updated 2 years ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methodsβ112Updated last month
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]β177Updated last month