anton-jeran / Speech2RIR
This is the official implementation of reverberant speech to room impulse response estimator
☆19Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for Speech2RIR
- Official PyTorch implementation of "RVAE-EM: Generative speech dereverberation based on recurrent variational auto-encoder and convolutiv…☆42Updated 8 months ago
- ☆15Updated 4 months ago
- Code for the paper "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"☆19Updated 2 weeks ago
- logWMSE, an audio quality metric with support for digital silence target. Useful for evaluating audio source separation systems, even whe…☆33Updated 2 months ago
- A list of datasets made available by members of the Aalto Acoustics Lab☆19Updated 2 months ago
- A robust pitch tracker using synchro-squeezed fft and frequency domain autocorrelation☆34Updated 10 months ago
- ☆42Updated last month
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.☆42Updated 2 months ago
- Viterbi decoding in PyTorch☆27Updated last month
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆31Updated 10 months ago
- Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale☆27Updated last year
- Reimplementation of Bandit for "Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support"☆21Updated 3 months ago
- Prediction of sound event bounding boxes (SEBBs)☆22Updated 3 months ago
- Prosody and Pronunciation Modification Network☆44Updated 3 months ago
- Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"☆22Updated 7 months ago
- Differentiable Mean Opinion Score Regularization for Perceptual Speech Enhancement☆22Updated last year
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆37Updated last month
- Da - ECHO - RetrievAl - daTasEt☆24Updated 4 months ago
- logWMSE, an audio quality metric & loss function with support for digital silence target. Useful for training and evaluating audio source…☆28Updated 3 months ago
- Source code for "BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement"☆13Updated 2 years ago
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆27Updated 3 months ago
- ☆20Updated last month
- A toolkit for researchers in the multimodal sound separation.☆16Updated last year
- Official implementation of paper: Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-…