Ybakman / TruthTorchLMLinks
☆57Updated 2 weeks ago
Alternatives and similar repositories for TruthTorchLM
Users that are interested in TruthTorchLM are comparing it to the libraries listed below
Sorting:
- This repository collects all relevant resources about interpretability in LLMs☆382Updated last year
- [NeurIPS D&B '25] The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning metho…☆423Updated last month
- ☆214Updated last year
- ☆136Updated this week
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆75Updated last year
- Using sparse coding to find distributed representations used by neural networks.☆283Updated 2 years ago
- A fast, effective data attribution method for neural networks in PyTorch☆220Updated last year
- AI Logging for Interpretability and Explainability🔬☆133Updated last year
- Conformal Language Modeling☆32Updated last year
- A resource repository for representation engineering in large language models☆140Updated last year
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvature☆167Updated 5 months ago
- ☆188Updated last year
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆284Updated 8 months ago
- ☆179Updated last year
- Python package for measuring memorization in LLMs.☆173Updated 4 months ago
- ☆102Updated last year
- ☆195Updated last month
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆141Updated 4 months ago
- [ICLR 2025] General-purpose activation steering library☆119Updated 2 months ago
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]☆203Updated 4 months ago
- Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.☆599Updated this week
- Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization☆37Updated last year
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).☆387Updated last year
- ☆63Updated 8 months ago
- Steering Llama 2 with Contrastive Activation Addition☆193Updated last year
- ☆241Updated last year
- awesome SAE papers☆59Updated 5 months ago
- A resource repository for machine unlearning in large language models☆506Updated 4 months ago
- A Survey on Data Selection for Language Models☆252Updated 6 months ago
- ☆382Updated 3 weeks ago