RicherMans / SAT
Streaming Audiotransformers for online Audio tagging
☆43Updated 9 months ago
Alternatives and similar repositories for SAT:
Users that are interested in SAT are comparing it to the libraries listed below
- ☆64Updated last year
- ☆60Updated last year
- Model configurations for scaling SE models in the paper "Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enha…☆33Updated 7 months ago
- ☆23Updated last year
- ☆48Updated 2 years ago
- ☆47Updated 4 months ago
- Official repo of ICASSP 2024 paper - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.☆51Updated 2 months ago
- An unofficial PyTorch implementation of Mix-Phoneme-Bert☆39Updated last year
- ☆51Updated 4 months ago
- This is the implementation our Interspeech 2022 paper " Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conv…☆18Updated last year
- ☆30Updated last year
- Source code and demo for INTERPSEECH 2023 paper: DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion P…☆36Updated last year
- Official Implementation of TSELM: Target speaker extraction using discrete tokens and language models☆41Updated 3 months ago
- An unofficial implementation of "UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding".☆25Updated last year
- Objective metrics used in several text-to-speech (TTS) papers.☆48Updated 2 years ago
- ICASSP2025Dynamic Embedding Causal Target Speech Extraction☆2Updated last week
- ☆26Updated last year
- ☆30Updated 3 months ago
- SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)☆65Updated last month
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆53Updated last year
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆79Updated 3 months ago
- A toolkit for researchers in the multimodal sound separation.☆16Updated last year
- Official data preparation and metric evaluation scripts for the Interspeech 2025 URGENT challenge.☆46Updated 2 months ago
- PAM is a no-reference audio quality metric for audio generation tasks☆57Updated 8 months ago
- Pytorch implementation of subband decomposition☆92Updated 2 years ago
- ☆33Updated last year
- A STFT/iSTFT written up in PyTorch using 1D Convolutions☆27Updated 8 months ago
- Learning and controlling the source-filter representation of speech with a variational autoencoder☆45Updated last year
- Please visit https://thuhcsi.github.io/SnakeGAN/☆36Updated last year