Ybakman / TruthTorchLMLinks
☆59Updated 2 weeks ago
Alternatives and similar repositories for TruthTorchLM
Users that are interested in TruthTorchLM are comparing it to the libraries listed below
Sorting:
- This repository collects all relevant resources about interpretability in LLMs☆387Updated last year
- ☆223Updated last year
- ☆136Updated 3 weeks ago
- A resource repository for representation engineering in large language models☆143Updated last year
- [NeurIPS D&B '25] The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning metho…☆444Updated last week
- Using sparse coding to find distributed representations used by neural networks.☆289Updated 2 years ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆78Updated last year
- Conformal Language Modeling☆32Updated last year
- A fast, effective data attribution method for neural networks in PyTorch☆223Updated last year
- AI Logging for Interpretability and Explainability🔬☆134Updated last year
- ☆191Updated last year
- [ICLR 2025] General-purpose activation steering library☆127Updated 2 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆151Updated 5 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆286Updated 8 months ago
- ☆195Updated 2 months ago
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]☆211Updated 5 months ago
- Unified access to Large Language Model modules using NNsight☆68Updated 3 weeks ago
- Steering Llama 2 with Contrastive Activation Addition☆196Updated last year
- ☆103Updated last year
- ☆401Updated this week
- ☆180Updated last year
- A resource repository for machine unlearning in large language models☆511Updated 4 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆285Updated last year
- Python package for measuring memorization in LLMs.☆175Updated 4 months ago
- Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization☆38Updated last year
- ☆366Updated 3 months ago
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvature☆171Updated 5 months ago
- ☆58Updated 2 years ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆100Updated 2 years ago
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).☆393Updated last year