Zhiyuan-Weng / BenchFormLinks

(ICLR25 Oral) Do as We Do, Not as You Think: the Conformity of Large Language Models

☆35

Alternatives and similar repositories for BenchForm

Users that are interested in BenchForm are comparing it to the libraries listed below

Sorting:

ZichenWen1 / DIJA
Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"
☆71Updated 2 months ago
eric-ai-lab / MSSBench
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"
☆30Updated 5 months ago
SaFo-Lab / AdaShield
[ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…
☆67Updated last year
Purshow / Awesome-LVLM-Hallucination
☆55Updated last year
AI45Lab / REEF
The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…
☆70Updated 11 months ago
Osilly / Awesome-Interleaving-Reasoning
Interleaving Reasoning: Next-Generation Reasoning Systems for AGI
☆216Updated last month
MLRM-Halu / MLRM-Halu
[NeurIPS 2025] More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
☆73Updated 6 months ago
franciscoliu / MLLMU-Bench
[NAACL 2025 Main] Official Implementation of MLLMU-Bench
☆43Updated 9 months ago
itsqyh / Awesome-LMMs-Mechanistic-Interpretability
A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…
☆167Updated last month
shengliu66 / VTI
Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering
☆94Updated last year
zhangce01 / DeGF
[ICLR 2025] Code for Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
☆23Updated 8 months ago
YU-deep / Awesome-Latent-Space
A paper list of Awesome Latent Space.
☆190Updated last week
The-Martyr / CausalMM
[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
☆57Updated 5 months ago
ASTRAL-Group / ASTRA
[CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbre…
☆45Updated 5 months ago
WangCheng0116 / Awesome-LRMs-Safety
Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning …
☆82Updated 3 months ago
UCSC-VLAA / STAR-1
[AAAI'26 Oral] Official Implementation of STAR-1: Safer Alignment of Reasoning LLMs with 1K Data
☆32Updated 8 months ago
Dongping-Chen / MLLM-Judge
[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.
☆86Updated 9 months ago
gszfwsb / Awesome-Dataset-Reduction
A curated list of awesome papers on dataset reduction, including dataset distillation (dataset condensation) and dataset pruning (coreset…
☆59Updated 11 months ago
ShenzheZhu / JailDAM
[COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
☆23Updated 3 weeks ago
wonderNefelibata / Awesome-LRM-Safety
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆78Updated last week
thu-ml / MMTrustEval
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
☆172Updated 5 months ago
gyhdog99 / ECSO
ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)
☆34Updated last year
clemneo / llava-interp
☆76Updated last year
thu-ml / STAIR
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
☆87Updated 9 months ago
DripNowhy / ETA
[ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"
☆27Updated 4 months ago
wenhangao21 / ICLR26_Paper_Finder
🌐 Permanent Hosting Site: http://ai-paper-finder.info/ 🌐 Hugging Face Hosting: https://huggingface.co/spaces/wenhanacademia/ai-paper-f…
☆227Updated 2 weeks ago
TrustGen / TrustEval-toolkit
Toolkit for evaluating the trustworthiness of generative foundation models.
☆123Updated 3 months ago
Ziwei-Zheng / VaLSe
A library of visualization tools for the interpretability and hallucination analysis of large vision-language models (LVLMs).
☆41Updated 6 months ago
ThreeSR / Awesome-Inference-Time-Scaling
Paper List of Inference/Test Time Scaling/Computing
☆327Updated 3 months ago
taco-group / Re-Align
[EMNLP'25] A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.
☆49Updated 3 months ago