zjysteven / mink-plus-plusLinks

[ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs

☆41

Alternatives and similar repositories for mink-plus-plus

Users that are interested in mink-plus-plus are comparing it to the libraries listed below

Sorting:

SafeAILab / RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆96Updated last year
yaojin17 / Unlearning_LLM
[ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"
☆59Updated 10 months ago
vinid / safety-tuned-llamas
ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.
☆85Updated last year
paul-rottger / xstest
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆106Updated 5 months ago
tatsu-lab / test_set_contamination
☆38Updated last year
ykwon0407 / DataInf
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)
☆73Updated 10 months ago
Thartvigsen / GRACE
[NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
☆78Updated 7 months ago
kevinyaobytedance / llm_unlearn
LLM Unlearning
☆172Updated last year
licong-lin / negative-preference-optimization
☆60Updated last year
fc2869 / lo-fit
LoFiT: Localized Fine-tuning on LLM Representations
☆39Updated 6 months ago
boyiwei / alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
☆81Updated 4 months ago
ajyl / dpo_toxic
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆74Updated 4 months ago
yihuaihong / ConceptVectors
ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
☆36Updated 5 months ago
SALT-NLP / Efficient_Unlearning
☆38Updated last year
epfl-dlab / llm-latent-language
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
☆78Updated last year
princeton-nlp / benign-data-breaks-safety
☆41Updated 10 months ago
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆68Updated 2 years ago
lyy1994 / awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
☆98Updated 2 weeks ago
JasonForJoy / Model-Editing-Hurt
EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
☆35Updated 2 months ago
chujiezheng / LLM-Safeguard
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆94Updated 2 months ago
hkust-nlp / PEM_composition
[NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"
☆61Updated last year
pratyushmaini / llm_dataset_inference
Official Repository for Dataset Inference for LLMs
☆36Updated last year
hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆61Updated last year
ejones313 / auditing-llms
☆55Updated 2 years ago
decoding-comp-trust / comp-trust
Codebase for decoding compressed trust.
☆24Updated last year
dannyallover / overthinking_the_truth
☆29Updated last year
IBM / activation-steering
[ICLR 2025] General-purpose activation steering library
☆87Updated last week
balevinstein / Probes
☆52Updated 2 years ago
Vaidehi99 / InfoDeletionAttacks
☆44Updated 5 months ago
locuslab / acr-memorization
☆35Updated 7 months ago