TIGER-AI-Lab / MEGA-Bench
This repo contains the code and data for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks"
☆32Updated this week
Related projects ⓘ
Alternatives and complementary repositories for MEGA-Bench
- Official Implementation of 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs☆30Updated 5 months ago
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆55Updated last month
- Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image …☆54Updated 3 weeks ago
- ☆61Updated last week
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆37Updated 6 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆47Updated 4 months ago
- [CVPR 2023] Code for "3D Concept Learning and Reasoning from Multi-View Images"☆75Updated 9 months ago
- The Scene Language: Representing Scenes with Programs, Words, and Embeddings (arXiv preprint)☆85Updated 2 weeks ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆57Updated last month
- Repo for paper: https://arxiv.org/abs/2404.06479☆25Updated last month
- A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World☆163Updated 3 weeks ago
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆129Updated 2 weeks ago
- Official implementation of "Self-Improving Video Generation"☆49Updated this week
- Semantic Score Distillation Sampling for Compositional Text-to-3D Generation☆27Updated 3 weeks ago
- [ICCV 2023] Code for "Multi-task View Synthesis with Neural Radiance Fields"☆11Updated last year
- Official implement of MIA-DPO☆32Updated last week
- Official repo for StableLLAVA☆90Updated 10 months ago
- Official Repository of Multi-Object Hallucination in Vision-Language Models (NeurIPS 2024)☆24Updated last month
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆46Updated 3 weeks ago
- Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆20Updated last month
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆41Updated 2 weeks ago
- Code for "Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes"☆51Updated 7 months ago
- The codes of Siggraph Asia 2024 paper "Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation"☆33Updated 2 months ago
- My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"☆14Updated this week
- ☆13Updated 2 months ago
- Scaffold Prompting to promote LMMs☆30Updated 5 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆57Updated last month
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆42Updated 2 weeks ago
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"☆23Updated 3 months ago