MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.
☆789Mar 4, 2026Updated this week
Alternatives and similar repositories for MOSS-TTS
Users that are interested in MOSS-TTS are comparing it to the libraries listed below
Sorting:
- MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, i…☆132Updated this week
- FREECODEC: A DISENTANGLED NEURAL SPEECH CODEC WITH FEWER TOKENS☆24Sep 9, 2024Updated last year
- The power-law compressed phase-aware asymmetric (PLCPA-ASYM) loss☆14Sep 4, 2023Updated 2 years ago
- FreeFuse: Multi-Subject LoRA Fusion via Adaptive Token-Level Routing at Test Time☆159Mar 2, 2026Updated last week
- ☆24Jul 20, 2025Updated 7 months ago
- [CVPR'26] VecGlypher: Unified Vector Glyph Generation with Language Models☆95Feb 26, 2026Updated last week
- MOVA: Towards Scalable and Synchronized Video–Audio Generation☆793Updated this week
- a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation☆61Mar 31, 2025Updated 11 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆76Jan 25, 2026Updated last month
- Collection of Google Colab notebooks (Free Tier) for AI tools.☆78Feb 25, 2026Updated last week
- Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control☆160Feb 26, 2026Updated last week
- The code used to evaluate embedding models on the Massive Legal Embedding Benchmark (MLEB).☆31Feb 24, 2026Updated 2 weeks ago
- Official Codebase for our CVPR 2026 paper UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass☆125Feb 24, 2026Updated 2 weeks ago
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆134Sep 19, 2025Updated 5 months ago
- Code for 'JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion'☆213Feb 10, 2026Updated 3 weeks ago
- [ICCV 2025] Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping☆91Nov 30, 2025Updated 3 months ago
- ☆69Updated this week
- ☆66Jan 12, 2026Updated last month
- MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flex…☆1,191Mar 2, 2026Updated last week
- [NeurIPS 2025] Separate Anything in Audio with Zero Training☆56Nov 3, 2025Updated 4 months ago
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆105May 5, 2025Updated 10 months ago
- Official Implementation of ReCo: Region-Constraint In-Context Generation for Instructional Video Editing☆147Updated this week
- Code2Worlds: Empowering Coding LLMs for 4D World Generation☆87Feb 26, 2026Updated last week
- [NeurIPS 2024] Can Language Models Learn to Skip Steps?☆22Jan 25, 2025Updated last year
- [CVPR 2026] Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"☆84Feb 13, 2026Updated 3 weeks ago
- Animate Any Character in Any World☆90Jan 9, 2026Updated 2 months ago
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆17Nov 19, 2025Updated 3 months ago
- ☆100Jan 19, 2026Updated last month
- LongCat Audio Tokenizer and Detokenizer☆285Mar 3, 2026Updated last week
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆154May 30, 2025Updated 9 months ago
- ☆112Feb 17, 2026Updated 2 weeks ago
- Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs☆77Dec 3, 2025Updated 3 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆270Feb 21, 2026Updated 2 weeks ago
- Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training☆68Feb 7, 2026Updated last month
- CosyVoice_DPO_NOTES: Supercharge Your Cosyvoice model with Cutting-Edge DPO Fine-Tuning!☆118Aug 8, 2025Updated 7 months ago
- MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows☆131Sep 2, 2025Updated 6 months ago
- Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis☆421Updated this week
- A high quality and fast TTS repository☆505Dec 22, 2025Updated 2 months ago
- FamilyTool benchmark☆12Sep 10, 2025Updated 6 months ago