ykwon0407 / DataInfLinks

DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)

☆73

Alternatives and similar repositories for DataInf

Users that are interested in DataInf are comparing it to the libraries listed below

Sorting:

ajyl / dpo_toxic
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆74Updated 5 months ago
swj0419 / muse_bench
☆23Updated 4 months ago
IBM / activation-steering
[ICLR 2025] General-purpose activation steering library
☆87Updated last week
logix-project / logix
AI Logging for Interpretability and Explainability🔬
☆125Updated last year
javiferran / sae_entities
☆60Updated 5 months ago
chrisliu298 / awesome-representation-engineering
A resource repository for representation engineering in large language models
☆129Updated 8 months ago
licong-lin / negative-preference-optimization
☆60Updated last year
boyiwei / alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
☆82Updated 4 months ago
CaoYuanpu / BiPO
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
☆30Updated last year
Thartvigsen / GRACE
[NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
☆78Updated 7 months ago
milesaturpin / cot-unfaithfulness
☆47Updated last year
paul-rottger / xstest
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆106Updated 5 months ago
lorenzkuhn / semantic_uncertainty
☆172Updated last year
fc2869 / lo-fit
LoFiT: Localized Fine-tuning on LLM Representations
☆39Updated 6 months ago
deeplearning-wisc / args
☆43Updated last year
TRAIS-Lab / dattri
`dattri` is a PyTorch library for developing, benchmarking, and deploying efficient data attribution algorithms.
☆81Updated 2 months ago
princeton-nlp / benign-data-breaks-safety
☆41Updated 10 months ago
VITA-Group / SEAL
Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆39Updated 4 months ago
jaechan-repo / muse_bench
☆26Updated last year
zlin7 / UQ-NLG
☆97Updated last year
dannyallover / overthinking_the_truth
☆29Updated last year
mmatena / model_merging
☆71Updated 3 years ago
AlexanderVNikitin / kernel-language-entropy
Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)
☆29Updated 7 months ago
nik-dim / tall_masks
Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]
☆47Updated 9 months ago
abhishekpanigrahi1996 / Skill-Localization-by-grafting
☆51Updated last year
zjysteven / mink-plus-plus
[ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs
☆41Updated 2 months ago
locuslab / acr-memorization
☆35Updated 7 months ago
git-disl / Vaccine
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆45Updated 8 months ago
Lingkai-Kong / RE-Control
Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective
☆32Updated 6 months ago
roeehendel / icl_task_vectors
☆96Updated last year