OpenMOSS / MOSS-TTSView external linksLinks
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.
☆172Updated this week
Alternatives and similar repositories for MOSS-TTS
Users that are interested in MOSS-TTS are comparing it to the libraries listed below
Sorting:
- MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, i…☆85Updated this week
- FREECODEC: A DISENTANGLED NEURAL SPEECH CODEC WITH FEWER TOKENS☆24Sep 9, 2024Updated last year
- The power-law compressed phase-aware asymmetric (PLCPA-ASYM) loss☆14Sep 4, 2023Updated 2 years ago
- a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation☆61Mar 31, 2025Updated 10 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆75Jan 25, 2026Updated 3 weeks ago
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆105May 5, 2025Updated 9 months ago
- [NeurIPS 2024] Can Language Models Learn to Skip Steps?☆22Jan 25, 2025Updated last year
- Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis☆240Updated this week
- ☆99Jan 19, 2026Updated 3 weeks ago
- FamilyTool benchmark☆12Sep 10, 2025Updated 5 months ago
- Sound Separation, Omni modal☆28Sep 15, 2025Updated 5 months ago
- Real-Time ASR with CNN-BiLSTM: End-to-End Live Streaming Using PyTorch Lightning⚡☆11Jan 23, 2025Updated last year
- A python tool help to interact with chatgpt.☆10Dec 11, 2022Updated 3 years ago
- ☆83Dec 31, 2025Updated last month
- Code for the blog "Neural audio codecs: how to get audio into LLMs"☆151Oct 20, 2025Updated 3 months ago
- ☆119Jan 18, 2026Updated last month
- MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flex…☆1,110Updated this week
- Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation☆28Dec 10, 2025Updated 2 months ago
- ☆24Jul 20, 2025Updated 6 months ago
- Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs☆33Dec 9, 2025Updated 2 months ago
- SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://arxiv.org/abs/2511.07931)☆56Dec 23, 2025Updated last month
- This is the code and dataset repo for Interspeech 2024 paper "Target conversation extraction: Source separation using turn-taking dynamic…☆55Aug 15, 2025Updated 6 months ago
- We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction☆179Feb 3, 2026Updated 2 weeks ago
- Official code of "RoboOmni: Proactive Robot Manipulation in Omni-modal Context"☆81Nov 17, 2025Updated 3 months ago
- The implementation of "End-to-End Neural Speaker Diarization with an Iterative Adaptive Attractor Estimation", which is accepted by Neura…☆11Aug 27, 2023Updated 2 years ago
- VehicleWorld is the first comprehensive multi-device environment for intelligent vehicle interaction that accurately models the complex, …☆21Sep 16, 2025Updated 5 months ago
- ☆16Jan 11, 2026Updated last month
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated last year
- LongCat Audio Tokenizer and Detokenizer☆285Feb 10, 2026Updated last week
- Code for 'JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion'☆38Feb 10, 2026Updated last week
- Towards Systematic Measurement for Long Text Quality☆37Sep 5, 2024Updated last year
- Evaluation tool used in the BigVSAN paper☆14Mar 22, 2024Updated last year
- FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens…☆38Feb 10, 2026Updated last week
- DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting☆17Mar 4, 2025Updated 11 months ago
- Code for the paper Proactive Hearing Assistants that Isolate Egocentric Conversations☆43Nov 19, 2025Updated 2 months ago
- This is the implementation of the manuscript "Learning General All-Neural Speech Enhancement based on Taylor's Approximation Theory", whi…☆14Nov 25, 2022Updated 3 years ago
- ☆15Apr 2, 2025Updated 10 months ago
- ☆31Aug 18, 2025Updated 5 months ago
- Implement rest api service for manipulating blog contents using FastAPI in Python☆12Feb 14, 2023Updated 3 years ago