teamtee / Qwen2-Audio-finetune
This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.
☆12Updated last month
Alternatives and similar repositories for Qwen2-Audio-finetune:
Users that are interested in Qwen2-Audio-finetune are comparing it to the libraries listed below
- ☆13Updated 5 months ago
- faster inference☆27Updated 2 months ago
- Streaming Vocos☆22Updated 2 months ago
- One command to start a streaming ASR server.☆11Updated 5 months ago
- ☆10Updated 5 months ago
- Official Implementation of TSELM: Target speaker extraction using discrete tokens and language models☆42Updated 3 months ago
- ☆10Updated 4 months ago
- Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open …☆15Updated 2 weeks ago
- An evaluation set for large-scale trained TTS models (Coming in Sep 2024)☆12Updated 6 months ago
- ☆28Updated 8 months ago
- Glow-TTS with Stochastic Duration Predictor and Stochastic Pitch Predictor☆18Updated last year
- ☆10Updated last year
- The implementation of MDNet, which is in submission to Interspeech2022☆13Updated 2 years ago
- Inference code for Audiodec-Valle-Wenetspeech4TTS☆48Updated 8 months ago
- LLaSE: Maximizing Acoustic Preservation for LLaMA based Speech Enhancement☆14Updated 3 weeks ago
- Unofficial pytorch implementation of VISinger: Variational Inference with Adversarial Learning for End-to-end Singing Voice Synthesis (IC…☆15Updated last year
- A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.☆32Updated 4 months ago
- ☆46Updated 2 months ago
- ☆11Updated last month
- DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors☆18Updated last month
- CTC decoder with hotwords for ASR.☆17Updated 2 months ago
- UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts☆21Updated 3 months ago
- Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis☆21Updated this week
- real-time speech enhance☆13Updated last year
- Training code for MaskGCT-T2S model.☆19Updated 3 months ago
- Code of the paper "Low-Latency Speech Separation Guided Diarization for Telephone Conversations"☆13Updated 2 years ago
- TTS Text Analyzer☆32Updated last year
- Survey on speech generation work.☆17Updated last year
- Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.☆12Updated last week
- Spherical residual vector quantization (SRVQ)☆28Updated 7 months ago