Ybakman / TruthTorchLM
☆31Updated 3 weeks ago
Alternatives and similar repositories for TruthTorchLM:
Users that are interested in TruthTorchLM are comparing it to the libraries listed below
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆63Updated 6 months ago
- ☆175Updated last year
- ☆27Updated last month
- ☆66Updated 3 years ago
- ☆164Updated 10 months ago
- AI Logging for Interpretability and Explainability🔬☆111Updated 10 months ago
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆101Updated last year
- ☆87Updated 9 months ago
- ☆93Updated last year
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]☆43Updated 6 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆72Updated last month
- A resource repository for representation engineering in large language models☆119Updated 5 months ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆75Updated 4 months ago
- Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.☆370Updated this week
- ☆49Updated last year
- A one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE and is an easily extensible framework for new datase…☆229Updated this week
- ☆23Updated 5 months ago
- ☆144Updated 5 months ago
- A fast, effective data attribution method for neural networks in PyTorch☆204Updated 5 months ago
- Using sparse coding to find distributed representations used by neural networks.☆236Updated last year
- AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.☆77Updated 5 months ago
- Conformal Language Modeling☆28Updated last year
- awesome SAE papers☆26Updated 2 months ago
- This repository collects all relevant resources about interpretability in LLMs☆341Updated 5 months ago
- code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"☆113Updated last year
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆40Updated 2 weeks ago
- ☆24Updated last year
- [NeurIPS 2023 Spotlight] Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training☆34Updated 2 weeks ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆76Updated 3 weeks ago
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆29Updated 2 months ago