vl-illusion / GVILLinks

Code and data for EMNLP 2023 paper "Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?"

☆14

Alternatives and similar repositories for GVIL

Users that are interested in GVIL are comparing it to the libraries listed below

Sorting:

MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆67Updated last year
UW-Madison-Lee-Lab / CoBSAT
Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"
☆41Updated 5 months ago
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
Share14 / ShareGemini
☆32Updated last year
aszala / VPEval
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆45Updated 2 years ago
si0wang / ViCrit
☆24Updated 5 months ago
TencentARC / GRPO-CARE
☆79Updated 5 months ago
facebookresearch / multimodal_rewardbench
Multimodal RewardBench
☆55Updated 9 months ago
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆51Updated 8 months ago
si0wang / VisVM
☆46Updated 11 months ago
yonatanbitton / wysiwyr
☆37Updated 2 years ago
RUCAIBox / Event-Bench
Official code of *Towards Event-oriented Long Video Understanding*
☆12Updated last year
eslambakr / HRS_benchmark
☆61Updated 2 years ago
palchenli / VL-Instruction-Tuning
☆91Updated 2 years ago
HYPJUDY / Sparkles
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
☆44Updated last year
linzhiqiu / CLIP-FlanT5
Training code for CLIP-FlanT5
☆30Updated last year
TencentARC / FLM
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
☆32Updated 2 years ago
McGill-NLP / diffusion-itm
Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"
☆33Updated last year
zeyofu / ReFocus_Code
Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]
☆43Updated 4 months ago
Hritikbansal / videocon
☆58Updated last year
llyx97 / FETV
[NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin L…
☆57Updated last year
mlfoundations / VisIT-Bench
☆50Updated 2 years ago
PVIT-official / PVIT
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
☆37Updated 2 years ago
pipilurj / bootstrapped-preference-optimization-BPO
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
☆59Updated last year
JieyuZ2 / ProVision
A instruction data generation system for multimodal language models.
☆35Updated 10 months ago
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆95Updated last year
showlab / MovieSeq
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆40Updated 8 months ago
HenryHZY / VL-PET
[ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"
☆52Updated 2 years ago
sjz5202 / LLaVA-Reward
Official repository for LLaVA-Reward (ICCV 2025): Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
☆22Updated 4 months ago
shuheikurita / RefEgo
☆13Updated last year