AGENDD / RWKV-ASR
This repo is an exploratory experiment to enable frozen pretrained RWKV language models to accept speech modality input. We followed the idea of SLAM_ASR and used the RWKV language model as the LLM, and instead of directly writing a prompt template we directly finetuned the initial state of the RWKV model.
☆40Updated last month
Alternatives and similar repositories for RWKV-ASR:
Users that are interested in RWKV-ASR are comparing it to the libraries listed below
- Official implementation of the TTS model Lina-Speech☆155Updated last month
- An easy-to-use, fast, and easily integrable tool for evaluating audio LLM☆37Updated this week
- 单独维护的中文TTS☆35Updated 2 years ago
- Chinese and English Bilinguish G2P☆20Updated last year
- F5-TTS 推理加速,速度提升约4倍!☆45Updated last month
- Just another FastSpeech 2 but cleaner code :)☆26Updated 7 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆63Updated 3 months ago
- (WIP)long form speech generatoins☆30Updated 2 months ago
- ☆24Updated this week
- flow mirror models from JZX AI Labs☆42Updated 4 months ago
- 4G GPU & 10 Minutes for train☆12Updated last year
- ☆18Updated 9 months ago
- ☆65Updated last year
- Unofficial Pytorch implementation of SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speake…☆57Updated last year
- SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime☆83Updated 4 months ago
- My vocoder experiments☆26Updated 4 months ago
- ☆28Updated last year
- All generative model in one for better TTS model☆66Updated 5 months ago
- ☆18Updated 4 months ago
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".☆24Updated 7 months ago
- ☆34Updated 10 months ago
- VoiceBank-2023 is the speech corpus specially designed for constructing personalized Mandarin text-to-speech (TTS) systems.☆39Updated last year
- ☆24Updated 3 months ago
- E2E TTS using Conditional Flow Matching (Experimental*)☆69Updated last year
- TTS FrontEnd DataSet: Polyphone / Prosody / TextNormalization