Zhiyuan-Weng / BenchFormLinks
(ICLR25 Oral) Do as We Do, Not as You Think: the Conformity of Large Language Models
☆35Updated 4 months ago
Alternatives and similar repositories for BenchForm
Users that are interested in BenchForm are comparing it to the libraries listed below
Sorting:
- Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"☆71Updated 2 months ago
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆30Updated 5 months ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…☆67Updated last year
- ☆55Updated last year
- The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…☆70Updated 11 months ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆216Updated last month
- [NeurIPS 2025] More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models☆73Updated 6 months ago
- [NAACL 2025 Main] Official Implementation of MLLMU-Bench☆43Updated 9 months ago
- A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…☆167Updated last month
- Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering☆94Updated last year
- [ICLR 2025] Code for Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models☆23Updated 8 months ago
- A paper list of Awesome Latent Space.☆190Updated last week
- [ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality☆57Updated 5 months ago
- [CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbre…☆45Updated 5 months ago
- Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning …☆82Updated 3 months ago
- [AAAI'26 Oral] Official Implementation of STAR-1: Safer Alignment of Reasoning LLMs with 1K Data☆32Updated 8 months ago
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆86Updated 9 months ago
- A curated list of awesome papers on dataset reduction, including dataset distillation (dataset condensation) and dataset pruning (coreset…☆59Updated 11 months ago
- [COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model☆23Updated 3 weeks ago
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆78Updated last week
- A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)☆172Updated 5 months ago
- ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)☆34Updated last year
- ☆76Updated last year
- Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"☆87Updated 9 months ago
- [ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"☆27Updated 4 months ago
- 🌐 Permanent Hosting Site: http://ai-paper-finder.info/ 🌐 Hugging Face Hosting: https://huggingface.co/spaces/wenhanacademia/ai-paper-f…☆227Updated 2 weeks ago
- Toolkit for evaluating the trustworthiness of generative foundation models.☆123Updated 3 months ago
- A library of visualization tools for the interpretability and hallucination analysis of large vision-language models (LVLMs).☆41Updated 6 months ago
- Paper List of Inference/Test Time Scaling/Computing☆327Updated 3 months ago
- [EMNLP'25] A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.☆49Updated 3 months ago