Llasa Speed Up
☆60Jan 18, 2026Updated last month
Alternatives and similar repositories for LLaSA_Plus
Users that are interested in LLaSA_Plus are comparing it to the libraries listed below
Sorting:
- [ICASSP 2026] Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis☆36Dec 24, 2025Updated 2 months ago
- ☆36Sep 6, 2025Updated 5 months ago
- A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.☆113Jun 4, 2025Updated 8 months ago
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- faster inference☆28Jan 20, 2025Updated last year
- LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement☆46Mar 10, 2025Updated 11 months ago
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 7 months ago
- Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.☆17May 9, 2025Updated 9 months ago
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs☆57Nov 19, 2025Updated 3 months ago
- LLaSE: Maximizing Acoustic Preservation for LLaMA based Speech Enhancement☆16Jul 11, 2025Updated 7 months ago
- Cover Song Detection System☆10Mar 29, 2019Updated 6 years ago
- Official code of SenSE.☆74Oct 30, 2025Updated 4 months ago
- Reimplementation of Miipher☆29Aug 16, 2023Updated 2 years ago
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆17Mar 3, 2025Updated 11 months ago
- Official code for paper:"Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding"☆33Jan 28, 2026Updated last month
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆105May 5, 2025Updated 9 months ago
- A Massive Contextual Speech Recognition Benchmark.☆99Aug 6, 2025Updated 6 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆75Jan 25, 2026Updated last month
- Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations☆62Jan 16, 2025Updated last year
- ☆23Jan 29, 2026Updated last month
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆481Nov 23, 2025Updated 3 months ago
- Official implementation of the paper titled "Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Mu…☆27Mar 5, 2024Updated last year
- CosyVoice_DPO_NOTES: Supercharge Your Cosyvoice model with Cutting-Edge DPO Fine-Tuning!☆117Aug 8, 2025Updated 6 months ago
- The source code for the paper CrossSinger (asru2023)☆18Oct 12, 2023Updated 2 years ago
- This repository implement a novel zero-shot TTS framework, named Flamed-TTS, focusing on the efficient generation and dynamic pacing in …☆57Aug 9, 2025Updated 6 months ago
- The baselines of ARC-Challenge-Interspeech2026☆56Dec 1, 2025Updated 2 months ago
- ☆115Sep 18, 2025Updated 5 months ago
- Prediction of sound event bounding boxes (SEBBs)☆32Aug 2, 2024Updated last year
- Inference code for Audiodec-Valle-Wenetspeech4TTS☆50Jul 14, 2024Updated last year
- ☆15May 8, 2021Updated 4 years ago
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆64Jun 16, 2025Updated 8 months ago
- Streamable Text-to-Speech model using a language modeling approach, without vector quantization☆110May 20, 2025Updated 9 months ago
- Efficient Personalized Speech Enhancement through Self-Supervised Learning☆23Mar 12, 2023Updated 2 years ago
- [EMNLP 2025 Findings] Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion☆34Sep 9, 2025Updated 5 months ago
- Official Repository for "Efficient Vocal Source Separation Through Windowed RoFormer"☆43Oct 30, 2025Updated 4 months ago
- ☆59Oct 22, 2025Updated 4 months ago
- Source code and demo for INTERPSEECH 2023 paper: DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion P…☆37Dec 5, 2023Updated 2 years ago
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability☆107Jan 17, 2025Updated last year
- Speech Resynthesis and Language Modeling☆27Jun 11, 2025Updated 8 months ago