AIFlames / MLLMGuard

☆15

Related projects ⓘ

Alternatives and complementary repositories for MLLMGuard

Improbable-AI / curiosity_redteam
Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizX…
☆60Updated 7 months ago
vinid / safety-tuned-llamas
ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.
☆70Updated 6 months ago
OpenSafetyLab / SALAD-BENCH
【ACL 2024】 SALAD benchmark & MD-Judge
☆103Updated last month
SafeAILab / RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆83Updated 5 months ago
boyiwei / alignment-attribution-code
Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
☆58Updated last month
hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆45Updated 7 months ago
ys-zong / VLGuard
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
☆45Updated 2 months ago
SihengLi99 / LLM-Honesty-Survey
A Survey on the Honesty of Large Language Models
☆44Updated last month
isXinLiu / MM-SafetyBench
Accepted by ECCV 2024
☆73Updated 3 weeks ago
thunxxx / MLLM-Jailbreak-evaluation-MMJ-Bench
☆19Updated this week
Dongping-Chen / MLLM-Judge
[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.
☆54Updated 3 months ago
jinzhuoran / RWKU
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
☆58Updated last month
HillZhang1999 / ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"
☆59Updated 8 months ago
Jihuai-wpy / InferAligner
☆25Updated last month
licong-lin / negative-preference-optimization
☆34Updated 3 months ago
ChenWu98 / agent-attack
[Arxiv 2024] Adversarial attacks on multimodal agents
☆38Updated 4 months ago
yaojin17 / Unlearning_LLM
[ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"
☆45Updated last month
GAIR-NLP / weak-to-strong-reasoning
☆53Updated 2 months ago
RenShuhuai-Andy / my-tools
my commonly-used tools
☆47Updated 3 months ago
UCSC-VLAA / vllm-safety-benchmark
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
☆66Updated 11 months ago
kevinyaobytedance / llm_unlearn
LLM Unlearning
☆123Updated last year
tatsu-lab / test_set_contamination
☆33Updated last year
chujiezheng / LLM-Safeguard
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆70Updated 2 months ago
Unispac / shallow-vs-deep-alignment
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆26Updated 4 months ago
zhxieml / remiss-jailbreak
☆20Updated 4 months ago
princeton-nlp / benign-data-breaks-safety
☆19Updated last month
luka-group / mDPO
[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.
☆22Updated 3 weeks ago
junyangwang0410 / AMBER
An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation
☆93Updated 9 months ago
sail-sg / Attention-Sink
[ATTRIB @ NeurIPS 2024] When Attention Sink Emerges in Language Models: An Empirical View
☆27Updated 3 weeks ago
swj0419 / muse_bench
☆15Updated 3 months ago