John-AI-Lab / Unnatural_Language

The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'

☆17

Alternatives and similar repositories for Unnatural_Language

Users that are interested in Unnatural_Language are comparing it to the libraries listed below

Sorting:

sail-sg / ActivePRM
☆15Updated last month
decoding-comp-trust / comp-trust
Codebase for decoding compressed trust.
☆23Updated last year
JasonForJoy / Model-Editing-Hurt
EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
☆35Updated 6 months ago
swj0419 / muse_bench
☆21Updated 2 months ago
princeton-nlp / benign-data-breaks-safety
☆36Updated 7 months ago
ShuheSH / A-Survey-of-the-Reasoning-Abilities-of-LLMs
☆22Updated 2 months ago
sail-sg / Cheating-LLM-Benchmarks
[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)
☆78Updated 6 months ago
VITA-Group / SEAL
Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆22Updated last month
pipilurj / MLLM-protector
The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"
☆36Updated last year
sail-sg / I-FSJ
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)
☆61Updated 4 months ago
Jayfeather1024 / Backdoor-Enhanced-Alignment
☆20Updated 5 months ago
sail-sg / D-TRAK
Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)
☆28Updated last year
OPTML-Group / SOUL
Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"
☆25Updated 7 months ago
David-Li0406 / AI-Supervision-Risk
☆18Updated last month
locuslab / acr-memorization
☆33Updated 4 months ago
SafeAILab / RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆93Updated 11 months ago
Uppaal / detox-edit
☆4Updated 3 months ago
UCSC-VLAA / vllm-safety-benchmark
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
☆80Updated last year
Li-Hyn / LLM_CatastrophicForgetting
Code for LLM_Catastrophic_Forgetting via SAM.
☆10Updated 11 months ago
VITA-Group / Robust_Weight_Signatures
[ICML 2023] "Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?" by Ruisi Cai, Zhenyu Zhang, Zhangyang Wang
☆16Updated 2 years ago
zzwjames / FailureLLMUnlearning
An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)
☆27Updated 2 months ago
tanganke / opcm
☆10Updated 3 months ago
vfleaking / PTST
Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"
☆18Updated last year
boyiwei / CoTaEval
[NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models
☆17Updated 9 months ago
Vaidehi99 / InfoDeletionAttacks
☆42Updated 3 months ago
haonan3 / V1
V1: Toward Multimodal Reasoning by Designing Auxiliary Task
☆34Updated last month
sail-sg / Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆72Updated 7 months ago
sail-sg / LightTrans
The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"
☆19Updated 3 weeks ago
ejones313 / auditing-llms
☆54Updated 2 years ago
chenchenygu / watermark-learnability
☆24Updated 3 months ago