The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs".
☆11Sep 27, 2024Updated last year
Alternatives and similar repositories for Hallu-PI
Users that are interested in Hallu-PI are comparing it to the libraries listed below
Sorting:
- A iterative feedback driven benchmark on LLM's instruction following ability☆55Jan 22, 2026Updated 2 months ago
- ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation. AAAI, 2025☆13Aug 25, 2025Updated 6 months ago
- e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language Explanations☆14Aug 19, 2021Updated 4 years ago
- AIRS-Bench: an AI Research Science benchmark for quantifying the end-to-end AI research abilities of LLM agents☆67Updated this week
- This repo implements VGG's Comparator Network [1].☆11Sep 4, 2018Updated 7 years ago
- This is the official code repository for the paper: Towards General Continuous Memory for Vision-Language Models.☆23Jul 3, 2025Updated 8 months ago
- [ICLR 2026] Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents☆42Mar 7, 2026Updated 2 weeks ago
- implementation of "Salient Object Ranking with Position-Preserved Attention"☆26Nov 10, 2021Updated 4 years ago
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆30Dec 24, 2025Updated 2 months ago
- The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…☆155Sep 2, 2025Updated 6 months ago
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆24Nov 29, 2024Updated last year
- extension for fabric to handle prompts through pexpect☆44May 31, 2015Updated 10 years ago
- A Python/Cython package for graph edit distances and graph matching☆13Jan 30, 2023Updated 3 years ago
- Material parsers and other tools, scripts Initially developed for Grobid Superconductor☆13Feb 21, 2025Updated last year
- ☆10Jul 2, 2020Updated 5 years ago
- Resources for our IJCAI 2020 paper, TopicKA: Generating Commonsense Knowledge-Aware Dialogue Responses Towards the Recommended Topic Fact☆12Nov 30, 2020Updated 5 years ago
- 原稿用紙;原稿紙;稿紙;日式便箋;UPTEX/UPLATEX 縱書☆10Nov 27, 2019Updated 6 years ago
- Objective metrics for measuring visual texture similarity using STSIM features. Supervised by Thrasos Pappas.☆14Oct 4, 2023Updated 2 years ago
- Reproducible Language Agent Research☆34Jun 25, 2025Updated 8 months ago
- Unofficial implementation algorithms of attention models on SNLI dataset☆33Jun 24, 2018Updated 7 years ago
- ☆18Jul 7, 2025Updated 8 months ago
- Author implementation of Exploring Adversarial Fake Images on Face Manifold (CVPR 2021 oral)☆32Mar 2, 2023Updated 3 years ago
- implement gat with batch☆10Nov 28, 2020Updated 5 years ago
- The repo of the Doc2SoarGraph framework☆10Sep 17, 2024Updated last year
- Self-Knowledge Guided Retrieval Augmentation for Large Language Models (EMNLP Findings 2023)☆28Dec 8, 2023Updated 2 years ago
- The implement of ACL2024: "MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization"☆43Jun 15, 2024Updated last year
- ☆11Dec 8, 2022Updated 3 years ago
- Code of CropMix: Sampling a Rich Input Distribution via Multi-Scale Cropping☆17Oct 8, 2022Updated 3 years ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆32Aug 5, 2025Updated 7 months ago
- Sentiment Lexicon Construction☆10Sep 17, 2019Updated 6 years ago
- 实现《Multiway Attention Networks for Modeling Sentence Pairs》中的网络模型,可用于问答,句子逻辑推理☆11Apr 13, 2020Updated 5 years ago
- Transfer Learning in Dialogue Benchmarking Toolkit☆14Mar 31, 2023Updated 2 years ago
- 基于词典的文本情感分析并且有用户界面“小白”☆10Jan 2, 2016Updated 10 years ago
- ☆11Feb 22, 2023Updated 3 years ago
- ☆11Feb 9, 2026Updated last month
- Multi Task Learning for Semantic Segmentation, Instance Segmentation and Depth Estimation☆13Jun 12, 2022Updated 3 years ago
- Code for TKDE paper "Learning Relation Prototype from Unlabeled Texts for Long-tail Relation Extraction"☆10Feb 19, 2024Updated 2 years ago
- Unofficial PyTorch implementation of "Composing Good Shots by Exploiting Mutual Relations"☆14May 13, 2022Updated 3 years ago
- DataSciBench: An LLM Agent Benchmark for Data Science☆54Jan 21, 2026Updated 2 months ago