emova-ollm/EMOVA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/emova-ollm/EMOVA)

emova-ollm / EMOVA

Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)

☆81

Alternatives and similar repositories for EMOVA

Users that are interested in EMOVA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

RainBowLuoCS / OpenOmni
View on GitHub
(NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Align…
☆142May 9, 2026Updated 2 months ago
cwang621 / blsp-emo
View on GitHub
BLSP-Emo: Towards Empathetic Large Speech-Language Models
☆62Jun 7, 2024Updated 2 years ago
huangrh99 / AlphaGRPO
View on GitHub
[ICML2026] Official Implementation of AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in Unified Multimodal Models via Decompo…
☆73Updated this week
OFA-Sys / AIR-Bench
View on GitHub
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
☆133Dec 9, 2024Updated last year
choijeongsoo / utut
View on GitHub
[TASLP 2024] Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
☆31Sep 6, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
wdqqdw / Echo
View on GitHub
Project page of "2026-ICLR Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning"
☆16Mar 26, 2026Updated 4 months ago
cwang621 / blsp
View on GitHub
BLSP: Bootstrapping Langauge-Speech Pre-training via Behavior Alignment of Continuation Writing
☆59Mar 11, 2024Updated 2 years ago
wysnzzzz / DIT
View on GitHub
☆18Nov 15, 2024Updated last year
pixeli99 / TrackDiffusion
View on GitHub
[WACV2025] Official PyTorch implementation of TrackDiffusion (https://arxiv.org/abs/2312.00651)
☆81Jun 26, 2024Updated 2 years ago
multimodal-art-projection / OmniBench
View on GitHub
A project for tri-modal LLM benchmarking and instruction tuning.
☆61Mar 27, 2025Updated last year
HarryHsing / EchoInk
View on GitHub
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning (🔥The Exploration of R1 for General Audio-Vis…
☆78Jun 3, 2026Updated last month
SLPcourse / Singing-Voice-Conversion
View on GitHub
Project of Singing Voice Conversion.
☆16Oct 27, 2023Updated 2 years ago
OpenBMB / UltraEval-Audio
View on GitHub
Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测，知己知彼。A unified benchmark framework for ASR/…
☆311Updated this week
nnnth / UFO
View on GitHub
[NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Langu…
☆281Nov 5, 2025Updated 8 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
HKUNLP / multilingual-transfer
View on GitHub
Code for paper ”Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability“
☆15Jun 13, 2023Updated 3 years ago
DLUT-LYZ / CODA-LM
View on GitHub
Official PyTorch implementation of CODA-LM(https://arxiv.org/abs/2404.10595)
☆103Dec 5, 2024Updated last year
xmed-lab / NuInstruct
View on GitHub
☆72Aug 12, 2024Updated last year
CASIA-LM / OpenS2S
View on GitHub
OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
☆119Mar 28, 2026Updated 4 months ago
kuan2jiu99 / Awesome-Speech-Generation
View on GitHub
Survey on speech generation work.
☆21Nov 26, 2023Updated 2 years ago
HashWang-null / IRBridge
View on GitHub
Official Implementation of "IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models"
☆19Jun 5, 2025Updated last year
pufanyi / syphus
View on GitHub
Syphus: Automatic Instruction-Response Generation Pipeline
☆14Dec 14, 2023Updated 2 years ago
BinWang28 / audio-ai-hub
View on GitHub
The hub for audio AI research: papers, open models, benchmarks & datasets across audio LLMs, speech recognition, TTS, music & audio gener…
☆949Updated this week
dyhBUPT / EnsembleMOT
View on GitHub
EnsembleMOT: A Step towards Ensemble Learning of Multiple Object Tracking
☆14Jan 22, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
ServiceNow / AU-Harness
View on GitHub
A comprehensive framework to test audio comprehension of Large Audio Language Models.
☆67Jun 9, 2026Updated last month
apple / ml-omni-router-moe-asr
View on GitHub
☆18Oct 24, 2025Updated 9 months ago
24DavidHuang / Emotion-Qwen
View on GitHub
Welcome to the official repository of Emotion-Qwen.
☆27Jun 10, 2025Updated last year
yanchi-3dv / PG-Occ
View on GitHub
[ICLR 2026] This is the official implementation of PG-Occ: Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocab…
☆34Feb 19, 2026Updated 5 months ago
baichuan-inc / Baichuan-Omni-1.5
View on GitHub
☆194Feb 8, 2025Updated last year
yangdongchao / Omni-AutoThink
View on GitHub
Adaptive Multimodal Reasoning via Reinforcement Learning
☆23Jan 11, 2026Updated 6 months ago
happinesslz / DrivePI
View on GitHub
[CVPR 2026] DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
☆127Mar 21, 2026Updated 4 months ago
xiwenc1 / DRA-GRPO
View on GitHub
Official code for the paper: DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
☆24Jan 6, 2026Updated 6 months ago
facebookresearch / Multi-SpatialMLLM
View on GitHub
[CVPR 2026] Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
☆178Feb 25, 2026Updated 5 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
MCG-NJU / TimeLens2
View on GitHub
TimeLens2: Generalist Video Temporal Grounding with Multimodal LLMs
☆57Updated this week
zlab-princeton / vero
View on GitHub
Vero: An Open RL Recipe for General Visual Reasoning
☆138Jun 19, 2026Updated last month
Kikyo-16 / airgen
View on GitHub
Official source codes of airsep
☆39Mar 26, 2024Updated 2 years ago
a43992899 / openl2s
View on GitHub
Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.
☆17May 9, 2025Updated last year
zxzhao0 / C2SER
View on GitHub
We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…
☆49Mar 3, 2025Updated last year
eshoyuan / TrackGPT
View on GitHub
TrackGPT: Track What You Need in Videos via Text Prompts
☆25May 16, 2023Updated 3 years ago
yeezhu / UNIT
View on GitHub
PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.
☆34Sep 26, 2024Updated last year