MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.
☆1,733Apr 29, 2026Updated last week
Alternatives and similar repositories for MOSS-TTS
Users that are interested in MOSS-TTS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- FreeFuse: Multi-Subject LoRA Fusion via Adaptive Token-Level Routing at Test Time☆187Mar 17, 2026Updated last month
- Code repo for EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation☆40Mar 6, 2026Updated 2 months ago
- ☆24Jul 20, 2025Updated 9 months ago
- MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, i…☆209Updated this week
- MOVA: Towards Scalable and Synchronized Video–Audio Generation☆973Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs☆92Sep 19, 2025Updated 7 months ago
- MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flex…☆1,310Mar 23, 2026Updated last month
- [ICML 2026] Code2Worlds: Empowering Coding LLMs for 4D World Generation☆105Updated this week
- a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation☆61Mar 31, 2025Updated last year
- [ICML2026] From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors☆87Apr 30, 2026Updated last week
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆295Mar 21, 2026Updated last month
- FREECODEC: A DISENTANGLED NEURAL SPEECH CODEC WITH FEWER TOKENS☆24Sep 9, 2024Updated last year
- [ICLR26] Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs☆33Dec 9, 2025Updated 5 months ago
- [ICCV 2025] Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping☆93Nov 30, 2025Updated 5 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Official repo for paper "SK-Adapter: Skeleton-Based Structural Control for Native 3D Generation".☆54Mar 22, 2026Updated last month
- ☆88Dec 31, 2025Updated 4 months ago
- [CVPR'26] VecGlypher: Unified Vector Glyph Generation with Language Models☆123Feb 26, 2026Updated 2 months ago
- [CVPR 2026] When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models☆66Apr 11, 2026Updated 3 weeks ago
- Official Codebase for our CVPR 2026 paper UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass☆145Feb 24, 2026Updated 2 months ago
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆142Sep 19, 2025Updated 7 months ago
- Code for 'JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion'☆248Feb 10, 2026Updated 2 months ago
- [CVPR'26 Highlight] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset☆597Oct 29, 2025Updated 6 months ago
- LongCat Audio Tokenizer and Detokenizer☆300Apr 24, 2026Updated 2 weeks ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆77Jan 25, 2026Updated 3 months ago
- Official code of "RoboOmni: Proactive Robot Manipulation in Omni-modal Context"☆105Mar 28, 2026Updated last month
- 🌋LavaSR: Fast Speech restoration and enhancement☆523Apr 6, 2026Updated last month
- The power-law compressed phase-aware asymmetric (PLCPA-ASYM) loss☆14Sep 4, 2023Updated 2 years ago
- FuwariStudio is a cross-platform Markdown editor built for managing Fuwari blog repositories. It provides a clean, modern writing experie…☆16Mar 29, 2026Updated last month
- A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical …☆59Sep 1, 2025Updated 8 months ago
- ☆1,964Apr 11, 2026Updated 3 weeks ago
- [NeurIPS 2024] Can Language Models Learn to Skip Steps?☆22Jan 25, 2025Updated last year
- [ICML 2026] Official Implementation of ReCo: Region-Constraint In-Context Generation for Instructional Video Editing☆152Updated this week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆13Apr 26, 2026Updated 2 weeks ago
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆112May 5, 2025Updated last year
- Towards Systematic Measurement for Long Text Quality☆38Sep 5, 2024Updated last year
- CosyVoice_DPO_NOTES: Supercharge Your Cosyvoice model with Cutting-Edge DPO Fine-Tuning!☆124Aug 8, 2025Updated 9 months ago
- A highly optimized engine for neutts-air model to generate minutes of audio in seconds. Over 200x realtime on modern hardware!☆119Nov 24, 2025Updated 5 months ago
- ☆71Jan 12, 2026Updated 3 months ago
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated 2 years ago