Ybakman / TruthTorchLM
β31Updated this week
Alternatives and similar repositories for TruthTorchLM:
Users that are interested in TruthTorchLM are comparing it to the libraries listed below
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)β63Updated 5 months ago
- β85Updated 9 months ago
- AI Logging for Interpretability and Explainabilityπ¬β108Updated 9 months ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptorsβ74Updated 3 months ago
- β66Updated 3 years ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.β71Updated 3 weeks ago
- β29Updated 11 months ago
- β17Updated 3 weeks ago
- β162Updated 9 months ago
- β171Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineeringβ52Updated 4 months ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modificationsβ73Updated last month
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"β32Updated 3 months ago
- A resource repository for representation engineering in large language modelsβ116Updated 4 months ago
- β93Updated last year
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)β41Updated 4 months ago
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".β98Updated last year
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]β42Updated 5 months ago
- AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.β72Updated 5 months ago
- Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)β17Updated 3 months ago
- code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"β108Updated last year
- β64Updated 2 months ago
- β47Updated last year
- β17Updated last year
- β37Updated last year
- β21Updated 2 weeks ago
- Function Vectors in Large Language Models (ICLR 2024)β153Updated 2 weeks ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"β56Updated 6 months ago
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"β49Updated last month
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"β35Updated 2 months ago