HumanMLLM / HumanOmniLinks

HumanOmni

☆205

Alternatives and similar repositories for HumanOmni

Users that are interested in HumanOmni are comparing it to the libraries listed below

Sorting:

IVGSZ / Flash-VStream
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆247Updated last month
HumanMLLM / HumanOmniV2
☆137Updated 3 months ago
inclusionAI / Ming
Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.
☆534Updated 3 weeks ago
VectorSpaceLab / Video-XL
🔥🔥First-ever hour scale video understanding models
☆575Updated 4 months ago
baichuan-inc / Baichuan-Omni-1.5
☆180Updated 9 months ago
SCZwangxiao / video-FlexReduc
Official implementation of paper AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
☆88Updated 6 months ago
RainBowLuoCS / OpenOmni
(NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Align…
☆109Updated 2 weeks ago
HumanMLLM / Omni-Emotion
☆21Updated 10 months ago
hyc2026 / StoryTeller
☆79Updated 8 months ago
yeliudev / VideoMind
💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
☆277Updated last month
Exploring-Embodied-Emotion-official / E3
☆19Updated 4 months ago
MCG-NJU / VideoChat-Online
[CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online
☆72Updated last month
HumanMLLM / ViSpeak
(ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"
☆40Updated 4 months ago
emova-ollm / EMOVA
Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)
☆74Updated 8 months ago
Kwai-Keye / Keye
☆697Updated 3 weeks ago
Kwai-YuanQi / MM-RLHF
The Next Step Forward in Multimodal LLM Alignment
☆186Updated 6 months ago
threegold116 / Awesome-Omni-MLLMs
This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels
☆70Updated last week
bytedance / video-SALMONN-2
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…
☆116Updated last month
bytedance / vidi
The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"
☆148Updated 2 months ago
md-mohaiminul / VideoRecap
☆198Updated last year
Tencent / VITA
The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.
☆124Updated 3 weeks ago
bytedance / tarsier
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆500Updated 3 months ago
invictus717 / MiCo
[ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale
☆118Updated last year
gls0425 / LinVT
LinVT: Empower Your Image-level Large Language Model to Understand Videos
☆82Updated 10 months ago
hlchen23 / ADPN-MM
Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…
☆52Updated last year
OpenGVLab / VideoChat-R1
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆227Updated last month
rikeilong / Bay-CAT
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…
☆57Updated last year
EvolvingLMMs-Lab / LongVA
Long Context Transfer from Language to Vision
☆397Updated 8 months ago
scofield7419 / EmpathyEar
Multimodal Empathetic Chatbot
☆51Updated last year
KlingTeam / MODA
[ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding
☆61Updated 4 months ago