MikeWangWZHL / VDLM
Repo for paper: https://arxiv.org/abs/2404.06479
☆27Updated 7 months ago
Alternatives and similar repositories for VDLM
Users that are interested in VDLM are comparing it to the libraries listed below
Sorting:
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆124Updated 10 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆65Updated last month
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆56Updated 6 months ago
- ☆16Updated 6 months ago
- Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)☆74Updated 2 months ago
- ☆48Updated last year
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆57Updated last month
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆76Updated 5 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆62Updated 10 months ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆28Updated 10 months ago
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆28Updated 4 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆66Updated 11 months ago
- Synthetic data generation pipelines for text-rich images.☆67Updated 2 months ago
- A benchmark dataset for evaluating LLM's SVG editing capabilities☆31Updated 7 months ago
- ☆63Updated last week
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- ☆51Updated last year
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆31Updated 2 months ago
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆83Updated 2 months ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers" [NeurIPS D&B, 2024]☆58Updated 4 months ago
- ☆85Updated last year
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆43Updated 2 months ago
- ☆44Updated last month
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆39Updated 2 months ago
- Multimodal RewardBench☆39Updated 2 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆27Updated 7 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆46Updated 5 months ago
- ☆97Updated last month
- ☆12Updated this week
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"☆36Updated last year