Ybakman / TruthTorchLMLinks
☆59Updated 2 months ago
Alternatives and similar repositories for TruthTorchLM
Users that are interested in TruthTorchLM are comparing it to the libraries listed below
Sorting:
- This repository collects all relevant resources about interpretability in LLMs☆390Updated last year
- ☆142Updated last month
- ☆229Updated last year
- [NeurIPS D&B '25] The one-stop repository for LLM unlearning☆471Updated last month
- A resource repository for representation engineering in large language models☆148Updated last year
- AI Logging for Interpretability and Explainability🔬☆138Updated last year
- Conformal Language Modeling☆32Updated 2 years ago
- ☆183Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆163Updated 7 months ago
- ☆195Updated last year
- [ICLR 2025] General-purpose activation steering library☆138Updated 4 months ago
- A fast, effective data attribution method for neural networks in PyTorch☆227Updated last year
- Using sparse coding to find distributed representations used by neural networks.☆293Updated 2 years ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆79Updated last year
- ☆58Updated 2 years ago
- ☆388Updated 5 months ago
- Tools for optimizing steering vectors in LLMs.☆19Updated 9 months ago
- Unified access to Large Language Model modules using NNsight☆81Updated 3 weeks ago
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models☆344Updated 6 months ago
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]☆218Updated 6 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆289Updated last year
- Trains Sparse Autoencoders based on outputs from language models☆11Updated last year
- This repo contains code for paper: "Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach".☆24Updated last year
- A Survey on Data Selection for Language Models☆253Updated 9 months ago
- Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization☆41Updated last year
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆197Updated 11 months ago
- An Open Source Implementation of Anthropic's Paper: "Towards Monosemanticity: Decomposing Language Models with Dictionary Learning"☆52Updated last year
- Python package for measuring memorization in LLMs.☆179Updated 6 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆291Updated last week
- ☆429Updated this week