Ybakman / TruthTorchLMLinks
☆59Updated 2 months ago
Alternatives and similar repositories for TruthTorchLM
Users that are interested in TruthTorchLM are comparing it to the libraries listed below
Sorting:
- This repository collects all relevant resources about interpretability in LLMs☆391Updated last year
- ☆143Updated last month
- AI Logging for Interpretability and Explainability🔬☆140Updated last year
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆79Updated last year
- ☆184Updated last year
- ☆230Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆163Updated 7 months ago
- Using sparse coding to find distributed representations used by neural networks.☆293Updated 2 years ago
- [ICLR 2025] General-purpose activation steering library☆141Updated 4 months ago
- A fast, effective data attribution method for neural networks in PyTorch☆229Updated last year
- ☆103Updated last year
- ☆197Updated last year
- A resource repository for representation engineering in large language models☆148Updated last year
- [NeurIPS D&B '25] The one-stop repository for LLM unlearning☆474Updated last month
- Steering Llama 2 with Contrastive Activation Addition☆207Updated last year
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).☆404Updated last year
- Unified access to Large Language Model modules using NNsight☆87Updated last week
- Conformal Language Modeling☆31Updated 2 years ago
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvature☆178Updated 7 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆290Updated last year
- Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization☆41Updated last year
- ☆58Updated 2 years ago
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]☆219Updated 6 months ago
- ☆241Updated last year
- An Open Source Implementation of Anthropic's Paper: "Towards Monosemanticity: Decomposing Language Models with Dictionary Learning"☆53Updated last year
- ☆34Updated last year
- ☆42Updated 2 years ago
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models☆348Updated 6 months ago
- ☆206Updated 3 months ago
- LLM finetuning in resource-constrained environments.☆55Updated last year