Ybakman / TruthTorchLMLinks
☆43Updated 3 months ago
Alternatives and similar repositories for TruthTorchLM
Users that are interested in TruthTorchLM are comparing it to the libraries listed below
Sorting:
- This repository collects all relevant resources about interpretability in LLMs☆365Updated 8 months ago
- The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods. All features: …☆318Updated 3 weeks ago
- Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.☆468Updated last week
- A resource repository for representation engineering in large language models☆128Updated 8 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆258Updated 4 months ago
- ☆95Updated last year
- Using sparse coding to find distributed representations used by neural networks.☆260Updated last year
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆71Updated 9 months ago
- ☆171Updated last year
- Steering Llama 2 with Contrastive Activation Addition☆165Updated last year
- ☆173Updated 7 months ago
- ☆148Updated 8 months ago
- AI Logging for Interpretability and Explainability🔬☆124Updated last year
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).☆341Updated last year
- Python package for measuring memorization in LLMs.☆160Updated this week
- ☆502Updated last year
- ☆107Updated this week
- A resource repository for machine unlearning in large language models☆435Updated last month
- General-purpose activation steering library☆85Updated 2 months ago
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]☆176Updated last week
- ☆70Updated 3 years ago
- ☆292Updated this week
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆103Updated 2 years ago
- A Survey on Data Selection for Language Models☆243Updated 2 months ago
- ☆54Updated 4 months ago
- ☆315Updated last week
- Sparse Autoencoder for Mechanistic Interpretability☆256Updated last year
- LLM Unlearning☆172Updated last year
- A fast, effective data attribution method for neural networks in PyTorch☆213Updated 8 months ago
- ☆234Updated last year