HumanMLLM/HumanOmniV2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HumanMLLM/HumanOmniV2)

HumanMLLM / HumanOmniV2

☆161

Alternatives and similar repositories for HumanOmniV2

Users that are interested in HumanOmniV2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HumanMLLM / HumanOmni
View on GitHub
HumanOmni
☆240Mar 10, 2025Updated last year
HumanMLLM / CoGenAV
View on GitHub
☆64Jul 1, 2025Updated last year
HarryHsing / EchoInk
View on GitHub
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning (🔥The Exploration of R1 for General Audio-Vis…
☆78Jun 3, 2026Updated last month
HumanMLLM / ViSpeak
View on GitHub
(ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"
☆53Jul 1, 2025Updated last year
yaolinli / TimeChat-Captioner
View on GitHub
[ICML 2026] Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
☆48Jun 29, 2026Updated 3 weeks ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
JaaackHongggg / WorldSense
View on GitHub
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
☆50Jul 12, 2026Updated last week
HumanMLLM / LOVE-R1
View on GitHub
Official repository of paper "LOVE-R1: Advancing Long Video Understanding with Adaptive Zoom-in Mechanism via Multi-Step Reasoning"
☆24Nov 1, 2025Updated 8 months ago
HumanMLLM / R1-Omni
View on GitHub
☆1,020Mar 24, 2025Updated last year
Lliar-liar / Daily-Omni
View on GitHub
This is the official repository of Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
☆42Apr 28, 2026Updated 2 months ago
HumanMLLM / LLaVA-Scissor
View on GitHub
The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
☆122Jul 1, 2025Updated last year
HVision-NKU / ASID-Caption
View on GitHub
ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Unde…
☆68Mar 3, 2026Updated 4 months ago
lzyhha / HSSL
View on GitHub
Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)
☆15May 2, 2025Updated last year
LaVi-Lab / Rethink_CoT_Video
View on GitHub
Official code for "Rethinking Chain-of-Thought Reasoning for Videos"
☆21Dec 14, 2025Updated 7 months ago
TencentARC / Video-Holmes
View on GitHub
[ECCV 2026] Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆95Jul 13, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
iSEE-Laboratory / HD-OVD
View on GitHub
(TMM 2025) Official repository of paper "A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection"
☆27Mar 14, 2025Updated last year
bytedance / video-SALMONN-2
View on GitHub
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…
☆204Feb 23, 2026Updated 4 months ago
mbzuai-oryx / Video-R2
View on GitHub
Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models
☆19Jan 21, 2026Updated 6 months ago
slp-rl / SpokenStoryCloze
View on GitHub
A spoken version of the textual story cloze benchmark
☆22Aug 6, 2023Updated 2 years ago
TencentARC / ARC-Hunyuan-Video-7B
View on GitHub
Structured Video Comprehension of Real-World Shorts
☆239Sep 21, 2025Updated 10 months ago
WPR001 / UGC_VideoCaptioner
View on GitHub
☆16Jun 23, 2026Updated last month
caojiaolong / Awesome-Mamba
View on GitHub
Collect papers about Mamba (a selective state space model).
☆15Aug 6, 2024Updated last year
Adam-duan / DiffRetouch
View on GitHub
[AAAI2025] This is the official PyTorch codes for the paper: "DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts"
☆25Jun 16, 2025Updated last year
WeChatCV / D-ORCA
View on GitHub
D-ORCA: Dialogue-Centric Optimization for Robust Audio-Visual Captioning
☆15Feb 11, 2026Updated 5 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
CASIA-LM / OpenS2S
View on GitHub
OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
☆119Mar 28, 2026Updated 3 months ago
maifoundations / Streamo
View on GitHub
Streaming Video Instruction Tuning
☆79Feb 25, 2026Updated 4 months ago
OpenGVLab / VKnowU
View on GitHub
[ECCV 2026] VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs
☆15Feb 3, 2026Updated 5 months ago
SenseTime-FVG / InteractiveOmni
View on GitHub
☆24Dec 3, 2025Updated 7 months ago
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆726Sep 24, 2025Updated 9 months ago
ZX-Yin / DreamLifting
View on GitHub
The code implementation for the paper "DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation".
☆30Sep 1, 2025Updated 10 months ago
QwenLM / Qwen3-Omni
View on GitHub
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…
☆3,903Apr 23, 2026Updated 3 months ago
IVUL-KAUST / VideoAuto-R1
View on GitHub
[CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
☆88Feb 27, 2026Updated 4 months ago
LJungang / Awesome-Video-Reasoning-Landscape
View on GitHub
🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.
☆189Jun 14, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
HVision-NKU / ControlSR
View on GitHub
☆13Apr 19, 2025Updated last year
HVision-NKU / GlimpsePrune
View on GitHub
[TCSVT] Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"
☆98Jun 12, 2026Updated last month
MikeWangWZHL / PAPO
View on GitHub
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
☆151Feb 4, 2026Updated 5 months ago
Ziyang412 / Video-RTS
View on GitHub
Code for EMNLP25 paper "Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning"
☆24Feb 18, 2026Updated 5 months ago
showlab / H2R-Grounder
View on GitHub
A V2V framework that translates human interaction videos into robot manipulation videos.
☆24Dec 12, 2025Updated 7 months ago
antgroup / OmniBench
View on GitHub
[ICML 2025 Oral] This is the official repository of the paper "What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensi…
☆22Jun 12, 2025Updated last year
BRZ911 / ViTCoT
View on GitHub
[ACM MM 2025] ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
☆18Jul 15, 2025Updated last year