TIGER-AI-Lab / MEGA-Bench
This repo contains the code and data for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks"
☆51Updated last week
Alternatives and similar repositories for MEGA-Bench:
Users that are interested in MEGA-Bench are comparing it to the libraries listed below
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆112Updated 6 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆47Updated 2 months ago
- ☆67Updated 6 months ago
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆58Updated 6 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated 4 months ago
- Official repo for StableLLAVA☆94Updated last year
- Official github repo of G-LLaVA☆122Updated 7 months ago
- ☆134Updated 2 months ago
- ☆94Updated last year
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆61Updated last month
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆117Updated last week
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆27Updated last month
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆62Updated 2 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 6 months ago
- ☆31Updated 2 weeks ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆56Updated 4 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆101Updated 10 months ago
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆58Updated 4 months ago
- ☆47Updated last year
- Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image …☆62Updated last month
- ☆49Updated last week
- Code for the paper "AutoPresent: Designing Structured Visuals From Scratch"☆41Updated last week
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆127Updated last month
- ☆73Updated 10 months ago
- [NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding☆56Updated last week
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆45Updated 3 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆67Updated last month
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆68Updated this week
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆62Updated 7 months ago
- A instruction data generation system for multimodal language models.☆29Updated last week