xiaoxiaomiao323 / MSAView external linksLinks
☆16Dec 17, 2024Updated last year
Alternatives and similar repositories for MSA
Users that are interested in MSA are comparing it to the libraries listed below
Sorting:
- Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization (ACM MM 2024)☆22Jul 25, 2024Updated last year
- This is the implementation of the manuscript "Learning General All-Neural Speech Enhancement based on Taylor's Approximation Theory", whi…☆14Nov 25, 2022Updated 3 years ago
- The code of the paper "Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation" (CVPR2023)☆18Mar 21, 2023Updated 2 years ago
- [INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.☆59Jan 24, 2024Updated 2 years ago
- Dynamic Mixing For Speech Processing (mix-on-the-fly)☆21Jul 19, 2022Updated 3 years ago
- The implementation of G2Net, the extension of GaGNet and is in submission to T-ASLP☆19Apr 27, 2022Updated 3 years ago
- Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction (LLM-TSE)☆42Oct 13, 2023Updated 2 years ago
- The implementation of TaylorBeamformer, which is in submission to Interspeech2022☆48Jun 10, 2022Updated 3 years ago
- Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge☆21Jul 25, 2022Updated 3 years ago
- ☆11Nov 5, 2025Updated 3 months ago
- Dynamic vision-guided speaker embedding for audio-visual speaker diarization☆12Jul 5, 2022Updated 3 years ago
- Pytorch implementation of Extended U-Net for Speaker Verification in Noisy Environments☆28Jul 24, 2023Updated 2 years ago
- ☆14May 9, 2022Updated 3 years ago
- The implementation of "End-to-End Neural Speaker Diarization with an Iterative Adaptive Attractor Estimation", which is accepted by Neura…☆11Aug 27, 2023Updated 2 years ago
- LightTTS is a lightweight TTS inference framework optimized for CosyVoice2 and CosyVoice3, enabling fast and scalable speech synthesis in…☆27Jan 7, 2026Updated last month
- ☆90Jun 9, 2024Updated last year
- ☆33Nov 29, 2022Updated 3 years ago
- Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAM☆17Nov 7, 2024Updated last year
- ☆17Mar 30, 2023Updated 2 years ago
- Optimizing speaker verification and spoofing countermeasure systems together with REINFORCE☆13Mar 31, 2021Updated 4 years ago
- Interspeech Tutorial - Resource Efficient and Cross-Modal Learning Toward Foundation Modeling☆15Oct 9, 2023Updated 2 years ago
- The official PyTorch implementation of "Inter-SubNet: Speech Enhancement with Subband Interaction", accepted by ICASSP 2023.☆100May 24, 2023Updated 2 years ago
- Code of the paper "Low-Latency Speech Separation Guided Diarization for Telephone Conversations"☆15Dec 22, 2022Updated 3 years ago
- Data simulation scripts for paper "Target Sound Extraction with Variable Cross-modality Clues"☆16May 19, 2023Updated 2 years ago
- Production first, nn-based on-device signal processing toolkit.☆65May 30, 2023Updated 2 years ago
- deep-learning based audio-visual lip bometrics☆15May 9, 2023Updated 2 years ago
- Digital Audio Effects in Python (material for MUSI6202@Georgiatech)☆15Nov 30, 2014Updated 11 years ago
- A toolkit for researchers in the multimodal sound separation.☆16Oct 20, 2023Updated 2 years ago
- Code for "StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model", AAAI2026 Oral☆42Jan 16, 2026Updated 3 weeks ago
- ☆16Sep 12, 2023Updated 2 years ago
- Model configurations for scaling SE models in the paper "Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enha…☆36Aug 7, 2024Updated last year
- Code and data recipes for the paper: Heterogeneous Target Speech Separation☆43Dec 6, 2022Updated 3 years ago
- ☆12Jun 14, 2022Updated 3 years ago
- ☆42Nov 22, 2024Updated last year
- Unofficial Implementation of "Liu, W., Li, A., Wang, X., Yuan, M., Chen, Y., Zheng, C., & Li, X. (2022). A Neural Beamspace-Domain Filter…☆17Oct 21, 2022Updated 3 years ago
- Audio-visual diarization pipeline used for creating VoxConverse dataset☆21Jun 6, 2025Updated 8 months ago
- ☆16Mar 7, 2019Updated 6 years ago
- Official implementation of Efficient Speech Separation Framework Based on Neural State-Space Models☆22Sep 21, 2023Updated 2 years ago
- INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues☆58May 29, 2023Updated 2 years ago