Ybakman / TruthTorchLMLinks
☆35Updated 2 months ago
Alternatives and similar repositories for TruthTorchLM
Users that are interested in TruthTorchLM are comparing it to the libraries listed below
Sorting:
- ☆40Updated 3 months ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆67Updated 8 months ago
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆45Updated last month
- ☆67Updated 3 years ago
- Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization☆25Updated 10 months ago
- ☆26Updated 6 months ago
- AI Logging for Interpretability and Explainability🔬☆119Updated last year
- A resource repository for representation engineering in large language models☆124Updated 6 months ago
- ☆166Updated 11 months ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆79Updated 2 months ago
- Conformal Language Modeling☆29Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆98Updated 3 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆92Updated this week
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆58Updated 6 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆125Updated last month
- ☆70Updated 4 months ago
- ☆89Updated 11 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆72Updated 3 months ago
- The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods. All features: …☆273Updated 2 weeks ago
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆102Updated last year
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]☆44Updated 7 months ago
- ☆50Updated last year
- ☆94Updated last year
- ☆15Updated 11 months ago
- Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)☆22Updated 5 months ago
- ☆39Updated 7 months ago
- LLM-Merging: Building LLMs Efficiently through Merging☆197Updated 8 months ago
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆56Updated 3 months ago
- LLM Unlearning☆162Updated last year
- Steering vectors for transformer language models in Pytorch / Huggingface☆103Updated 3 months ago