MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.
☆1,279Apr 13, 2026Updated this week
Alternatives and similar repositories for MOSS-TTS
Users that are interested in MOSS-TTS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- FreeFuse: Multi-Subject LoRA Fusion via Adaptive Token-Level Routing at Test Time☆179Mar 17, 2026Updated last month
- Code repo for EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation☆39Mar 6, 2026Updated last month
- ☆24Jul 20, 2025Updated 8 months ago
- MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, i…☆191Updated this week
- MOVA: Towards Scalable and Synchronized Video–Audio Generation☆945Apr 1, 2026Updated 2 weeks ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs☆92Sep 19, 2025Updated 7 months ago
- MOSS-VL is the core multimodal model series within the OpenMOSS ecosystem, dedicated to visual understanding.☆222Updated this week
- MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flex…☆1,271Mar 23, 2026Updated 3 weeks ago
- Code2Worlds: Empowering Coding LLMs for 4D World Generation☆96Feb 26, 2026Updated last month
- a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation☆61Mar 31, 2025Updated last year
- From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors☆84Mar 7, 2026Updated last month
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆291Mar 21, 2026Updated 3 weeks ago
- FREECODEC: A DISENTANGLED NEURAL SPEECH CODEC WITH FEWER TOKENS☆24Sep 9, 2024Updated last year
- [CVPR 2026] When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models☆52Apr 11, 2026Updated last week
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [ICCV 2025] Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping☆93Nov 30, 2025Updated 4 months ago
- Official repo for paper "SK-Adapter: Skeleton-Based Structural Control for Native 3D Generation".☆53Mar 22, 2026Updated 3 weeks ago
- Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs☆33Dec 9, 2025Updated 4 months ago
- [CVPR'26] VecGlypher: Unified Vector Glyph Generation with Language Models☆118Feb 26, 2026Updated last month
- ☆86Dec 31, 2025Updated 3 months ago
- Official Codebase for our CVPR 2026 paper UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass☆140Feb 24, 2026Updated last month
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆139Sep 19, 2025Updated 7 months ago
- [CVPR'26 Highlight] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset☆594Oct 29, 2025Updated 5 months ago
- Code for 'JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion'☆240Feb 10, 2026Updated 2 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Official code of "RoboOmni: Proactive Robot Manipulation in Omni-modal Context"☆103Mar 28, 2026Updated 3 weeks ago
- LongCat Audio Tokenizer and Detokenizer☆299Apr 2, 2026Updated 2 weeks ago
- 🌋LavaSR: Fast Speech restoration and enhancement☆508Apr 6, 2026Updated last week
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆77Jan 25, 2026Updated 2 months ago
- The power-law compressed phase-aware asymmetric (PLCPA-ASYM) loss☆14Sep 4, 2023Updated 2 years ago
- ☆1,865Apr 11, 2026Updated last week
- Official Implementation of ReCo: Region-Constraint In-Context Generation for Instructional Video Editing☆149Mar 5, 2026Updated last month
- A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical …☆59Sep 1, 2025Updated 7 months ago
- [NeurIPS 2024] Can Language Models Learn to Skip Steps?☆22Jan 25, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆13Jan 11, 2026Updated 3 months ago
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆111May 5, 2025Updated 11 months ago
- Towards Systematic Measurement for Long Text Quality☆38Sep 5, 2024Updated last year
- A highly optimized engine for neutts-air model to generate minutes of audio in seconds. Over 200x realtime on modern hardware!☆116Nov 24, 2025Updated 4 months ago
- CosyVoice_DPO_NOTES: Supercharge Your Cosyvoice model with Cutting-Edge DPO Fine-Tuning!☆124Aug 8, 2025Updated 8 months ago
- ComfyUI custom nodes for Fish Audio S2-Pro TTS — voice clone, multi-speaker, and text-to-speech☆196Apr 11, 2026Updated last week
- Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis☆562Updated this week