alibabasglab / MossFormer2
This is the audio sample repository for speech separation model "MossFormer2".
☆77Updated 5 months ago
Related projects: ⓘ
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆64Updated 11 months ago
- Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E☆134Updated last year
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark☆123Updated 3 months ago
- ☆97Updated this week
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆127Updated last year
- An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement☆104Updated last week
- Official Repository For VoxBlink2☆37Updated last month
- The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based …☆75Updated 7 months ago
- VoiceLDM: Text-to-Speech with Environmental Context☆157Updated last month
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆169Updated 3 weeks ago
- ☆139Updated 8 months ago
- ☆114Updated 2 weeks ago
- Easy-to-Use Speech MOS predictors☆209Updated 10 months ago
- VALL-E 2 reproduction☆72Updated 2 months ago
- FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3☆149Updated 4 months ago
- ONNX Inference of Pyannote Segmentation☆54Updated last week
- [ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"☆295Updated 2 weeks ago
- This is the official implementation of the SEMamba paper. (Accepted to IEEE SLT 2024)☆118Updated last week
- Implementation of SoundStorm built upon SpeechTokenizer.☆98Updated 10 months ago
- It's a repository for implementations of neural speech editing algorithms.☆185Updated 8 months ago
- LlamaVoice is a llama-based large voice generation model, providing inference and training ability.☆169Updated 3 weeks ago
- Object-oriented handling of audio data, with GPU-powered augmentations, and more.☆218Updated last month
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆39Updated last week
- StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation☆165Updated this week
- Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)☆211Updated this week
- CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model☆177Updated 4 months ago
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io☆66Updated 11 months ago
- Unofficial implementation of NVIDIA P-Flow TTS paper☆210Updated 2 months ago
- Official implementation of Vec-Tok Speech☆91Updated 11 months ago
- Unofficial implementation of miipher☆102Updated 5 months ago