MikeWangWZHL / VDLM
Repo for paper: https://arxiv.org/abs/2404.06479
☆25Updated last month
Related projects ⓘ
Alternatives and complementary repositories for VDLM
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers"☆40Updated last month
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆46Updated 3 weeks ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- This repo contains the code and data for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks"☆32Updated last week
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆66Updated 4 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆42Updated last week
- Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image …☆54Updated 3 weeks ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆85Updated last month
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆57Updated last month
- Official github repo of G-LLaVA☆121Updated 5 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆57Updated 5 months ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆42Updated 2 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆105Updated last month
- DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆55Updated 2 weeks ago
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆82Updated 8 months ago
- ☆28Updated last week
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆46Updated 4 months ago
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆29Updated 3 weeks ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆26Updated 4 months ago
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"☆23Updated 3 months ago
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆122Updated last week
- [EMNLP 2023, Findings] GRACE: Discriminator-Guided Chain-of-Thought Reasoning☆44Updated 3 weeks ago
- Directional Preference Alignment☆49Updated last month
- Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs☆41Updated 4 months ago
- ☆101Updated last month
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆72Updated 6 months ago
- Self-Alignment with Principle-Following Reward Models☆148Updated 8 months ago
- MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆16Updated this week
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆16Updated 2 weeks ago
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆44Updated 3 months ago