multimodal-art-projection/OmniBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/multimodal-art-projection/OmniBench)

multimodal-art-projection / OmniBench

A project for tri-modal LLM benchmarking and instruction tuning.

☆61

Alternatives and similar repositories for OmniBench

Users that are interested in OmniBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Lliar-liar / Daily-Omni
View on GitHub
This is the official repository of Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
☆42Updated this week
JaaackHongggg / WorldSense
View on GitHub
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
☆50Jul 12, 2026Updated 2 weeks ago
yaolinli / TimeChat-Captioner
View on GitHub
[ICML 2026] Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
☆49Jun 29, 2026Updated last month
fyvo / WMT-Biomed-Test
View on GitHub
☆13Aug 23, 2024Updated last year
WPR001 / UGC_VideoCaptioner
View on GitHub
☆16Jun 23, 2026Updated last month
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
AV-Reasoner / AV-Reasoner
View on GitHub
☆19Jul 22, 2025Updated last year
multimodal-art-projection / IV-Bench
View on GitHub
☆14Apr 23, 2025Updated last year
NJU-LINK / T2AV-Compass
View on GitHub
The Source Code for T2AV-Compass @ ICML 2026
☆20Jun 21, 2026Updated last month
Wusiwei0410 / SciMMIR
View on GitHub
☆25Aug 1, 2024Updated last year
Espere-1119-Song / Video-MMLU
View on GitHub
A Massive Multi-Discipline Lecture Understanding Benchmark
☆34Apr 20, 2026Updated 3 months ago
NJU-LINK / OmniVideoBench
View on GitHub
The Source Code for OmniVideoBench @ICLR 2026
☆77Feb 12, 2026Updated 5 months ago
lscpku / VITATECS
View on GitHub
☆18Jul 10, 2024Updated 2 years ago
baichuan-inc / Baichuan-Omni-1.5
View on GitHub
☆193Feb 8, 2025Updated last year
ChengHan111 / VPT-or-FT
View on GitHub
Official Pytorch implementation of 'Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning'? (ICLR2024)
☆13Mar 8, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
AVoCaDO-Captioner / AVoCaDO
View on GitHub
https://avocado-captioner.github.io/
☆37Oct 16, 2025Updated 9 months ago
OmniMMI / OmniMMI
View on GitHub
[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
☆23Jul 14, 2026Updated 2 weeks ago
princetonvisualai / merv
View on GitHub
Unifying Specialized Visual Encoders for Video Language Models
☆25Nov 22, 2025Updated 8 months ago
NJU-LINK / ViDiC-1K
View on GitHub
The Source Code for ViDiC-1K
☆15Mar 13, 2026Updated 4 months ago
OFA-Sys / AIR-Bench
View on GitHub
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
☆133Dec 9, 2024Updated last year
junyangwang0410 / HaELM
View on GitHub
An automatic MLLM hallucination detection framework
☆19Sep 26, 2023Updated 2 years ago
xfactlab / I0T
View on GitHub
[ACL Main 2025] I0T: Embedding Standardization Method Towards Zero Modality Gap
☆12Jun 18, 2025Updated last year
LaVi-Lab / Visual-Table
View on GitHub
[EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"
☆20Oct 17, 2024Updated last year
d-ailin / CLIP-Guided-Decoding
View on GitHub
☆18Aug 1, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
naver-ai / muco
View on GitHub
Official Pytorch implementation of MuCo: Multi-turn Contrastive Learning for Multimodal Embedding Model (CVPR 2026)
☆15Apr 16, 2026Updated 3 months ago
wsntxxn / TextToAudioGrounding
View on GitHub
The dataset and baseline code for Text-to-Audio Grounding (TAG)
☆49Oct 23, 2025Updated 9 months ago
choiHkk / nix-tts
View on GitHub
End-To-End SpeechSynthesis system with knowledge distillation
☆18Jul 16, 2022Updated 4 years ago
hshjerry / VideoEspresso
View on GitHub
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆140Jul 28, 2025Updated last year
NJU-LINK / IF-VidCap
View on GitHub
The Source Code for IF-VidCap @ICLR 2026
☆19Oct 22, 2025Updated 9 months ago
NJU-LINK / DRIFT
View on GitHub
Design for Error Detection in Deep-Research Agents Trajectories.
☆22Jun 4, 2026Updated last month
Quehry / HelloBench
View on GitHub
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
☆60Nov 26, 2024Updated last year
kaistAI / Volcano
View on GitHub
[NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…
☆49Aug 21, 2024Updated last year
30stomercury / whisper-char-alignment
View on GitHub
Word alignments based on character-level attention maps in Whisper with unsupervised head selection.
☆20Jan 15, 2026Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
RAIVNLab / CREPE
View on GitHub
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆35Apr 27, 2023Updated 3 years ago
sangho-vision / acav100m
View on GitHub
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.
☆64Nov 18, 2021Updated 4 years ago
KOR-Bench / KOR-Bench
View on GitHub
☆19Nov 12, 2024Updated last year
uthree / ddsp-vocoder
View on GitHub
☆12Nov 7, 2024Updated last year
kangreen0210 / LIME
View on GitHub
Accelerating the development of large multimodal models (LMMs) with lmms-eval
☆14Oct 14, 2024Updated last year
zhaoyx239 / X-Translator
View on GitHub
☆26Jul 21, 2026Updated last week
bronyayang / HallE_Control
View on GitHub
HallE-Control: Controlling Object Hallucination in LMMs
☆32Apr 10, 2024Updated 2 years ago