xieh97 / dcase2023-audio-retrievalLinks
Baseline system for Language-based Audio Retrieval (Task 6B) in DCASE 2023 Challenge
☆10Updated 2 years ago
Alternatives and similar repositories for dcase2023-audio-retrieval
Users that are interested in dcase2023-audio-retrieval are comparing it to the libraries listed below
Sorting:
- 《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》☆77Updated 2 years ago
- faster inference☆28Updated 9 months ago
- (WIP)long form speech generatoins☆31Updated 7 months ago
- Official PyTorch inference code for the Interspeech 2025 paper: Efficient Speech Enhancement via Embeddings from Pre-trained Generative A…☆71Updated 4 months ago
- Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs☆74Updated 3 months ago
- Streamable Text-to-Speech model using a language modeling approach, without vector quantization☆102Updated 5 months ago
- Official release of StyleTalk dataset.☆70Updated last year
- Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models☆42Updated 2 months ago
- TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…☆107Updated 10 months ago
- AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data☆33Updated last year
- [ICASSP 2024] KNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels☆42Updated last year
- Inference code for Audiodec-Valle-Wenetspeech4TTS☆50Updated last year
- CTC decoder with hotwords for ASR.☆31Updated 7 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆99Updated last year
- Compute WER and SER for speech recognition evaluation☆14Updated last week
- ☆36Updated last year
- Streaming Text to Speech Web UI☆22Updated last year
- A TTS Trained on Universal Audio.☆40Updated 5 months ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆66Updated last year
- A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5☆42Updated 7 months ago
- WavReward: Spoken Dialogue Models With Generalist Reward Evaluators☆54Updated 5 months ago
- ☆23Updated last year
- Chinese-Mimi 是对 Moshi 模型的声码器进行了中文语料上的适配。☆31Updated 7 months ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆37Updated last year
- A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.☆58Updated last year
- ☆104Updated 3 weeks ago
- Official repository for U-SAM (Interspeech 2025)☆23Updated 5 months ago
- We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction☆143Updated last week
- ☆20Updated 3 months ago
- Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark☆33Updated 6 months ago