zehanwang01 / OmniBindLinks

☆33

Alternatives and similar repositories for OmniBind

Users that are interested in OmniBind are comparing it to the libraries listed below

Sorting:

AILab-CVC / M2PT
[CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
☆100Updated last year
Beckschen / LLaVolta
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
☆61Updated 9 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆67Updated 9 months ago
invictus717 / MiCo
[ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale
☆118Updated last year
haoyu-bu / CAFe
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆25Updated 7 months ago
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆76Updated last year
NVlabs / QLIP
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
☆93Updated 8 months ago
JaaackHongggg / WorldSense
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
☆33Updated last month
Hon-Wong / ByteVideoLLM
[ICCV 2025] Dynamic-VLM
☆26Updated 11 months ago
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆61Updated 4 months ago
rese1f / aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆131Updated 5 months ago
Go2Heart / StreamFormer
[ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.
☆66Updated last month
RenShuhuai-Andy / NBP
Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
☆39Updated 9 months ago
TencentARC / GRPO-CARE
☆78Updated 4 months ago
AV-Odyssey / AV-Odyssey
This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"
☆30Updated 10 months ago
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆66Updated last year
yliu-cs / PiTe
[ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model
☆17Updated 9 months ago
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆127Updated 3 months ago
Yxxxb / VoCo-LLaMA
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆194Updated 5 months ago
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆89Updated last year
DCDmllm / MorphTokens
☆43Updated last year
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆65Updated last year
TIGER-AI-Lab / VISTA
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆20Updated 8 months ago
HumanMLLM / ViSpeak
(ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"
☆40Updated 4 months ago
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆77Updated 4 months ago
Gen-Verse / HermesFlow
[NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
☆71Updated 2 months ago
xjtupanda / Sparrow
Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"
☆48Updated 2 months ago
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆94Updated last year
TencentARC / ViT-Lens
[CVPR 2024] ViT-Lens: Towards Omni-modal Representations
☆183Updated 9 months ago
path2generalist / General-Level
On Path to Multimodal Generalist: General-Level and General-Bench
☆19Updated 4 months ago