Ybakman / TruthTorchLMLinks
β59Updated last month
Alternatives and similar repositories for TruthTorchLM
Users that are interested in TruthTorchLM are comparing it to the libraries listed below
Sorting:
- This repository collects all relevant resources about interpretability in LLMsβ389Updated last year
- AI Logging for Interpretability and Explainabilityπ¬β138Updated last year
- β140Updated last week
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)β79Updated last year
- Using sparse coding to find distributed representations used by neural networks.β290Updated 2 years ago
- Conformal Language Modelingβ32Updated 2 years ago
- [NeurIPS D&B '25] The one-stop repository for LLM unlearningβ456Updated 2 weeks ago
- β227Updated last year
- β182Updated last year
- A fast, effective data attribution method for neural networks in PyTorchβ227Updated last year
- β193Updated last year
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..β290Updated 2 weeks ago
- β61Updated 2 years ago
- Python package for measuring memorization in LLMs.β178Updated 5 months ago
- A resource repository for representation engineering in large language modelsβ145Updated last year
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvatureβ175Updated 6 months ago
- β104Updated last year
- β414Updated this week
- [ICLR 2025] General-purpose activation steering libraryβ133Updated 3 months ago
- β241Updated last year
- β80Updated 3 years ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methodsβ161Updated 6 months ago
- Sparse Autoencoder for Mechanistic Interpretabilityβ285Updated last year
- β202Updated 2 months ago
- A Survey on Data Selection for Language Modelsβ254Updated 8 months ago
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).β402Updated last year
- β380Updated 4 months ago
- πͺ Interpreto is an interpretability toolbox for LLMsβ95Updated 3 weeks ago
- Steering Llama 2 with Contrastive Activation Additionβ204Updated last year
- Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. ACM Computing Surveys, 2025.β634Updated this week