wangtianrui / ProgRELinks
☆27Updated last year
Alternatives and similar repositories for ProgRE
Users that are interested in ProgRE are comparing it to the libraries listed below
Sorting:
- A Chinese Expressive Long-dialogue Speech Dataset with Scripts☆20Updated last year
- ☆174Updated last year
- The repoduction codes for Qwen-Audio Fine-tuning☆53Updated last year
- This repository documents Barry's journey in learning deep learning for speech processing. Here, you'll find scripts and code snippets re…☆13Updated 2 months ago
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆204Updated 2 weeks ago
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆175Updated 8 months ago
- ☆114Updated 6 months ago
- SpeechFormer++ in PyTorch☆49Updated 2 years ago
- [CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation☆43Updated last year
- Official implement of SpeechFormer written in Python (PyTorch).☆80Updated 2 years ago
- ☆10Updated 3 years ago
- ☆19Updated last year
- Code for paper "Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition"☆20Updated 2 years ago
- official implementation of MGA-CLAP (ACM MM 2024)☆24Updated last year
- Accepted by TMM 2022☆18Updated 3 years ago
- This package aims at simplifying the download of the AudioCaps dataset.☆36Updated 2 years ago
- ☆14Updated last year
- ☆39Updated last year
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆77Updated last month
- A summary of speech data augment algorithms☆69Updated 4 years ago
- Code for CVSSP submission to DCASE 2021 Task 6☆36Updated 3 years ago
- ☆12Updated 7 months ago
- ☆11Updated 10 months ago
- Implementation of the paper "Attentive Statistics Pooling for Deep Speaker Embedding" in Pytorch☆48Updated 5 years ago
- ☆13Updated 3 years ago
- Implementation of our paper 'On Metric Learning For Audio-Text Cross-Modal Retrieval'☆50Updated 3 years ago
- ☆12Updated last year
- Baseline method for audio-visual sound event localization and detection task of DCASE 2023 challenge☆60Updated 8 months ago
- ☆48Updated 3 years ago
- Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)☆59Updated last year