wangtianrui / ProgRE
☆24Updated 7 months ago
Alternatives and similar repositories for ProgRE:
Users that are interested in ProgRE are comparing it to the libraries listed below
- A Chinese Expressive Long-dialogue Speech Dataset with Scripts☆19Updated 5 months ago
- This repository documents Barry's journey in learning deep learning for speech processing. Here, you'll find scripts and code snippets re…☆13Updated 8 months ago
- ☆18Updated last year
- ☆10Updated 2 years ago
- Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"☆19Updated last year
- Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)☆56Updated 10 months ago
- Code for CVSSP submission to DCASE 2021 Task 6☆35Updated 2 years ago
- Code for paper "Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition"☆19Updated last year
- Code for paper "Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition"☆41Updated last year
- ☆13Updated 9 months ago
- ☆12Updated 7 months ago
- SpeechFormer++ in PyTorch☆48Updated last year
- Dynamic vision-guided speaker embedding for audio-visual speaker diarization☆11Updated 2 years ago
- ☆15Updated 2 years ago
- [ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes☆30Updated this week
- It includes papers on speech&audio field. Now update: ICLR2023-2025, ICML2023-2024, NeurIPS2023-2024, ACMMM2024, AAAI2024, ACL2024, EMNLP…☆49Updated this week
- A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech…☆11Updated 9 months ago
- Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)☆22Updated last year
- Pytorch implementation of our paper: Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training.☆17Updated 2 years ago
- 学习随笔☆18Updated 3 years ago
- ☆30Updated last year
- The repoduction codes for Qwen-Audio Fine-tuning☆38Updated 8 months ago
- INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues☆51Updated last year
- This is a public repository for RATS Channel-A Speech Data, which is a chargeable noisy speech dataset under LDC. Here we release its Log…☆15Updated 2 years ago
- 语音增强TFCN论文复现☆40Updated 3 years ago
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation☆37Updated this week
- ICASSP 2023: 'Speaker recognition with two-step multi-modal deep cleansing'☆40Updated 2 years ago
- ☆40Updated 2 years ago
- ☆33Updated 5 months ago
- ☆46Updated 2 years ago