JIA-Lab-research/MGM-Omni

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JIA-Lab-research/MGM-Omni)

JIA-Lab-research / MGM-Omni

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

☆205

Alternatives and similar repositories for MGM-Omni

Users that are interested in MGM-Omni are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nonverbalspeech38k / nonverspeech38k
View on GitHub
The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…
☆68Dec 26, 2025Updated 6 months ago
llm-jp / llama-mimi
View on GitHub
Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…
☆31Sep 20, 2025Updated 10 months ago
EIT-NLP / LLaSO
View on GitHub
☆116Oct 21, 2025Updated 9 months ago
ASLP-lab / FlashTTS
View on GitHub
Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation
☆63Jun 16, 2026Updated last month
ddlBoJack / Omni-Captioner
View on GitHub
[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
☆142Apr 7, 2026Updated 3 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
bigai-nlco / UltraVoice
View on GitHub
Official Repository of UltraVoice
☆62Oct 28, 2025Updated 8 months ago
yfyeung / DS-WED
View on GitHub
[ICASSP 2026] Official code for "Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration"
☆17Apr 16, 2026Updated 3 months ago
RainBowLuoCS / OpenOmni
View on GitHub
(NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Align…
☆142May 9, 2026Updated 2 months ago
xiquan-li / Resonate
View on GitHub
[INTERSPEECH 2026] Pre-training, SFT, DPO and GRPO for Text-to-Audio Generation
☆48Apr 17, 2026Updated 3 months ago
ozspeech / OZSpeech
View on GitHub
[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
☆45Feb 9, 2025Updated last year
inclusionAI / Ming-UniAudio
View on GitHub
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
☆450Nov 27, 2025Updated 7 months ago
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
ryuclc / CosyVoice2-GRPO
View on GitHub
A simple implementation for improving CosyVoice2 by GRPO method
☆39May 5, 2026Updated 2 months ago
jishengpeng / WavReward
View on GitHub
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
☆56May 15, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
XiaomiMiMo / MiMo-Audio
View on GitHub
MiMo-Audio: Audio Language Models are Few-Shot Learners
☆1,065Jun 17, 2026Updated last month
bfs18 / armel
View on GitHub
poorman's ar-dit tts
☆45Dec 31, 2025Updated 6 months ago
xkx-hub / KALL-E
View on GitHub
[AAAI 2026 oral] KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
☆42Sep 25, 2025Updated 9 months ago
WangHelin1997 / SSR-Speech
View on GitHub
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis
☆154Jan 1, 2025Updated last year
walker-hyf / GPT-Talker
View on GitHub
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆78Nov 1, 2024Updated last year
xzf-thu / Mini-Omni-Reasoner
View on GitHub
Mini-Omni-Reasoner: a real-time speech reasoning framework that interleaves silent reasoning tokens with spoken response tokens (“thinkin…
☆165Aug 26, 2025Updated 10 months ago
hhguo / SoCodec
View on GitHub
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
☆92Dec 20, 2024Updated last year
hyzhang24 / DuplexSLA
View on GitHub
DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action
☆99May 20, 2026Updated 2 months ago
CASIA-LM / OpenS2S
View on GitHub
OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
☆119Mar 28, 2026Updated 3 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
hertz-pj / SNAC-Vocos
View on GitHub
A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.
☆70Oct 28, 2024Updated last year
jingzhunxue / FlowMirror_HydraVox
View on GitHub
FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens…
☆49Feb 17, 2026Updated 5 months ago
yongaifadian1 / MAGIC-TTS
View on GitHub
MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control
☆51Apr 28, 2026Updated 2 months ago
b04901014 / vae-gslm
View on GitHub
Official Implementation for the paper: A Variational Framework for Improving Naturalness in Generative Spoken Language Models
☆24Jun 18, 2025Updated last year
haoweilou / ParaStyleTTS
View on GitHub
This is the official code for ACM CIKM 2025 Paper: ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive …
☆59Dec 21, 2025Updated 7 months ago
ShawnPi233 / SynParaSpeech
View on GitHub
Official Repository of Paper: "SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding" (IC…
☆72Apr 27, 2026Updated 2 months ago
gwh22 / UniVoice
View on GitHub
UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
☆115Oct 30, 2025Updated 8 months ago
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆422Jan 29, 2026Updated 5 months ago
davidbrowne17 / Mimi-Voice
View on GitHub
Create Unmute voice embeddings
☆26Nov 15, 2025Updated 8 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
xiaomi-research / tts-prism
View on GitHub
☆47Apr 27, 2026Updated 2 months ago
yangdongchao / Omni-AutoThink
View on GitHub
Adaptive Multimodal Reasoning via Reinforcement Learning
☆23Jan 11, 2026Updated 6 months ago
LAION-AI / emotion-annotations
View on GitHub
☆110Updated this week
BayLing-Models / BayLing-Duplex
View on GitHub
Native full-duplex speech dialogue inference for BayLing-Duplex.
☆63Jun 22, 2026Updated 3 weeks ago
FreedomIntelligence / EchoX
View on GitHub
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
☆47Sep 19, 2025Updated 10 months ago
OpenMOSS / MOSS-TTSD
View on GitHub
MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flex…
☆1,360Mar 23, 2026Updated 3 months ago
sarulab-speech / DuplexChat
View on GitHub
☆46Jul 5, 2026Updated 2 weeks ago