OmniMMI / OpenOmniNexusLinks

a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.

☆35

Alternatives and similar repositories for OpenOmniNexus

Users that are interested in OpenOmniNexus are comparing it to the libraries listed below

Sorting:

OFA-Sys / AIR-Bench
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
☆124Updated 11 months ago
Sakshi113 / MMAU
☆118Updated 2 months ago
VITA-MLLM / LUCY
LUCY: Linguistic Understanding and Control Yielding Early Stage of Her
☆55Updated 7 months ago
cwang621 / blsp-emo
BLSP-Emo: Towards Empathetic Large Speech-Language Models
☆53Updated last year
thuhcsi / SpeechCraft
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
☆173Updated 7 months ago
ddlBoJack / MMAR
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆176Updated 5 months ago
shuaijiang / Ke-Omni-R
Ke-Omni-R is an advanced audio reasoning model and achieved SOTA on MMAU
☆55Updated 5 months ago
wntg / LLaMA-Omni
llama-omni训练代码复现
☆69Updated 9 months ago
jishengpeng / WavChat
A Survey of Spoken Dialogue Models (60 pages)
☆310Updated 11 months ago
OpenBMB / UltraEval-Audio
An easy-to-use, fast, and easily integrable tool for evaluating audio LLM
☆165Updated 3 weeks ago
zruiii / QwenAudioSFT
The repoduction codes for Qwen-Audio Fine-tuning
☆50Updated last year
EIT-NLP / LLaSO
☆104Updated last month
CASIA-LM / OpenS2S
OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
☆94Updated 4 months ago
baichuan-inc / Baichuan-Audio
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
☆216Updated 8 months ago
multimodal-art-projection / OmniBench
A project for tri-modal LLM benchmarking and instruction tuning.
☆50Updated 7 months ago
yangdongchao / LLM-Codec
The open source code for LLM-Codec
☆141Updated last year
RainBowLuoCS / OpenOmni
(NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Align…
☆109Updated last week
choijeongsoo / av2av
[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
☆42Updated last year
MoonshotAI / Kimi-Audio-Evalkit
☆148Updated 3 months ago
thunlp / duplex-model
☆41Updated last year
JeongHun0716 / MMS-LLaMA
Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens…
☆38Updated 5 months ago
amphionspace / SD-Eval
[NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
☆53Updated last year
NKU-HLT / DIFFA
[AAAI 2026] DIFFA: Large Language Diffusion Models Can Listen and Understand
☆30Updated last week
GalaxyCong / EmoDubber
Official source codes for the paper: EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.
☆31Updated 5 months ago
FreedomIntelligence / FusionAudio
Towards Fine-grained Audio Captioning with Multimodal Contextual Cues
☆83Updated last month
ictnlp / DASpeech
Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".
☆63Updated last year
xiaomi-research / r1-aqa
🤗 R1-AQA Model: mispeech/r1-aqa
☆306Updated 7 months ago
ddlBoJack / Awesome-Speech-Language-Model
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
☆187Updated last year
KexinHUANG19 / InstructTTSEval
☆19Updated 4 months ago
OpenMOSS / MOSS-Speech
MOSS-Speech is a true speech-to-speech large language model without text guidance.
☆64Updated last month