Ybakman / TruthTorchLMLinks
☆47Updated last month
Alternatives and similar repositories for TruthTorchLM
Users that are interested in TruthTorchLM are comparing it to the libraries listed below
Sorting:
- This repository collects all relevant resources about interpretability in LLMs☆370Updated 10 months ago
- The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods. All features: …☆358Updated last month
- ☆99Updated last year
- A fast, effective data attribution method for neural networks in PyTorch☆217Updated 9 months ago
- ☆172Updated last year
- ☆191Updated 9 months ago
- AI Logging for Interpretability and Explainability🔬☆124Updated last year
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆74Updated 11 months ago
- Conformal Language Modeling☆32Updated last year
- ☆342Updated last week
- [ICLR 2025] General-purpose activation steering library☆99Updated last week
- A resource repository for representation engineering in large language models☆132Updated 9 months ago
- Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.☆518Updated this week
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).☆358Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆125Updated 2 months ago
- ☆55Updated 2 years ago
- ☆62Updated 6 months ago
- Using sparse coding to find distributed representations used by neural networks.☆265Updated last year
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆266Updated 5 months ago
- A Survey on Data Selection for Language Models☆247Updated 4 months ago
- ☆116Updated last month
- Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization☆31Updated last year
- code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"☆131Updated last year
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆77Updated 5 months ago
- Steering Llama 2 with Contrastive Activation Addition☆178Updated last year
- ☆165Updated 9 months ago
- ☆96Updated last year
- ☆238Updated last year
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆185Updated 6 months ago
- ☆73Updated 3 years ago