MikeWangWZHL / VDLM
Repo for paper: https://arxiv.org/abs/2404.06479
☆25Updated 4 months ago
Alternatives and similar repositories for VDLM:
Users that are interested in VDLM are comparing it to the libraries listed below
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆49Updated 4 months ago
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆63Updated 2 weeks ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆67Updated 2 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆59Updated last week
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆104Updated 3 weeks ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆59Updated 7 months ago
- ☆16Updated 3 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated last week
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆42Updated 4 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆116Updated 7 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆18Updated 7 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆64Updated 2 months ago
- Code for the paper "AutoPresent: Designing Structured Visuals From Scratch"☆49Updated last month
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆24Updated last month
- Preference Learning for LLaVA☆37Updated 3 months ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆43Updated 6 months ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆26Updated 7 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆63Updated 8 months ago
- Official github repo of G-LLaVA☆128Updated this week
- [EMNLP 2023, Findings] GRACE: Discriminator-Guided Chain-of-Thought Reasoning☆47Updated 4 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆77Updated 7 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆31Updated 3 months ago
- Large Language Models Can Self-Improve in Long-context Reasoning☆62Updated 2 months ago
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆89Updated 2 months ago
- FocusLLM: Scaling LLM’s Context by Parallel Decoding☆36Updated 2 months ago
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"☆32Updated last year
- [NeurIPS2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆92Updated 2 months ago
- ☆48Updated last year