liutaocode / DiarizationVisualizationLinks
Visualization tools for audio-only and multi-modal speaker diarization dataset
☆12Updated last year
Alternatives and similar repositories for DiarizationVisualization
Users that are interested in DiarizationVisualization are comparing it to the libraries listed below
Sorting:
- Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…☆95Updated 7 months ago
- Official Pytorch Implementation of "Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Pr…☆222Updated last year
- [INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.☆52Updated last year
- The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based …☆140Updated 2 weeks ago
- This is the audio sample repository for speech separation model "MossFormer2".☆138Updated 8 months ago
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io☆69Updated last year
- Target Speaker Extraction Toolkit☆187Updated 3 weeks ago
- Official repository of SepReformer for speech separation☆212Updated 7 months ago
- Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion☆148Updated last year
- FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3☆212Updated last year
- Analysis of XLS-R for Speech Quality Assessment☆13Updated 6 months ago
- Official Repository For VoxBlink2☆76Updated last year
- ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations☆171Updated last year
- ☆79Updated last month
- The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"☆173Updated 9 months ago
- ☆68Updated 11 months ago
- ONNX Inference of Pyannote Segmentation☆92Updated 7 months ago
- ☆87Updated 10 months ago
- Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)☆122Updated 3 years ago
- Speaker change detection using SincNet and an LSTM/Transformer☆53Updated 2 months ago
- TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion☆146Updated last year
- Predicts the level of noise and reverberation on your audiofiles☆156Updated last month
- iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform☆261Updated last month
- Object-oriented handling of audio data, with GPU-powered augmentations, and more.☆286Updated 4 months ago
- This repo provides the processed samples of the manuscript "MossFormer: Pushing the Performance Limit of Monaural Speech Separation using…☆94Updated 8 months ago
- Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event …☆400Updated last year
- Baseline multi-resolution cross network model trained using the Divide and Remaster Dataset☆83Updated last year
- [INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for …☆164Updated 2 months ago
- NOTSOFAR-1 Challenge: Distant Diarization and ASR☆55Updated 6 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆96Updated 7 months ago