multimodal-art-projection / OmniBench
A project for tri-modal LLM benchmarking and instruction tuning.
☆19Updated 2 months ago
Alternatives and similar repositories for OmniBench:
Users that are interested in OmniBench are comparing it to the libraries listed below
- Preference Learning for LLaVA☆35Updated 2 months ago
- Official implement of MIA-DPO☆49Updated last week
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆54Updated last year
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆48Updated 6 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆20Updated 5 months ago
- This repo contains code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation"☆11Updated 3 weeks ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆33Updated last month
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆25Updated 7 months ago
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆19Updated 3 weeks ago
- PyTorch implementation of StableMask (ICML'24)☆12Updated 7 months ago
- ☆24Updated last year
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- ☆47Updated last year
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆61Updated 3 months ago
- DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception. (ICLR2025)☆18Updated last month
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆44Updated last month
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆37Updated 7 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆68Updated this week
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆57Updated 4 months ago
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆19Updated last month
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆39Updated last week
- CLIP-MoE: Mixture of Experts for CLIP☆23Updated 3 months ago
- LMM which strictly superset LLM embedded☆37Updated 2 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 7 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆33Updated 3 months ago
- ☆17Updated 6 months ago
- MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models☆23Updated last week
- A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents☆41Updated 2 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆59Updated 4 months ago