liamdugan / speech-to-speechView external linksLinks
Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"
☆32Jan 14, 2025Updated last year
Alternatives and similar repositories for speech-to-speech
Users that are interested in speech-to-speech are comparing it to the libraries listed below
Sorting:
- DST is a Decoder-only simultaneous machine translation model, which can conduct policy decision and translation concurrently☆11Jun 6, 2024Updated last year
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".☆25Jul 2, 2024Updated last year
- LLaSE: Maximizing Acoustic Preservation for LLaMA based Speech Enhancement☆16Jul 11, 2025Updated 7 months ago
- This repo contains the baseline model recipes and pre-trained model for GramVanni hindi ASR challenge☆15Mar 26, 2022Updated 3 years ago
- Room impulse response simulation for various array architectures using Monte-Carlo simulation and quaternions (Python)☆17May 25, 2025Updated 8 months ago
- Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…☆28Sep 20, 2025Updated 4 months ago
- Sing any popular song with your voice☆11Jul 10, 2022Updated 3 years ago
- Implementation of CGMM-MVDR beamforming used for Clarity challenge☆13Jan 14, 2022Updated 4 years ago
- Official code for paper:"Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding"☆33Jan 28, 2026Updated 2 weeks ago
- This is the unofficial implementation of MFNet, from paper''a Mask Free Neural Network for Monaural Speech Enhancement''☆13Dec 20, 2024Updated last year
- unofficial implementation of "CPTNN: CROSS-PARALLEL TRANSFORMER NEURAL NETWORK FOR TIME-DOMAIN SPEECH ENHANCEMENT"☆15Nov 14, 2023Updated 2 years ago
- Voice activity detection and speaker gender segmentation audiovisual corpus☆16Jan 20, 2025Updated last year
- Code for ACL-IJCNLP 2021 paper "N-Best-ASR-Transformer: Enhancing SLU Performance using Multiple ASR Hypotheses."☆17Nov 30, 2021Updated 4 years ago
- [ACL 2024] An easily extensible framework for simultaneous, text-to-text neural machine translation (SimulMT) for LLMs.☆19Apr 21, 2025Updated 9 months ago
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆31Aug 30, 2025Updated 5 months ago
- Code for EMNLP 2022 main conference paper "Information-Transport-based Policy for Simultaneous Translation"☆13Nov 3, 2022Updated 3 years ago
- Code for ACL 2022 main conference paper "Modeling Dual Read/Write Paths for Simultaneous Machine Translation"☆12Mar 31, 2022Updated 3 years ago
- ☆35Sep 1, 2022Updated 3 years ago
- ☆20Apr 27, 2024Updated last year
- WarpRNNT loss ported in Numba CPU/CUDA for Pytorch☆17Mar 11, 2022Updated 3 years ago
- Neural network density models for speech separation.☆20Nov 26, 2020Updated 5 years ago
- [Early Alpha] A unified framework for text-to-speech, voice conversion, automatic speech recognition, audio classification, voice activit…☆22Jan 10, 2025Updated last year
- EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System☆15Mar 31, 2019Updated 6 years ago
- Dataset release for Emotional TTS in Indian Accent☆40Sep 2, 2022Updated 3 years ago
- Streamable Text-to-Speech model using a language modeling approach, without vector quantization☆110May 20, 2025Updated 8 months ago
- Official Repository for "Efficient Vocal Source Separation Through Windowed RoFormer"☆42Oct 30, 2025Updated 3 months ago
- An official implementation of Style-Talker for Spoken Dialogue Generation☆23Jan 12, 2025Updated last year
- Efficient Personalized Speech Enhancement through Self-Supervised Learning☆23Mar 12, 2023Updated 2 years ago
- Source code for ACL 2023 paper "End-to-End Simultaneous Speech Translation with Differentiable Segmentation"☆37Dec 6, 2023Updated 2 years ago
- Speech Resynthesis and Language Modeling☆27Jun 11, 2025Updated 8 months ago
- Open Source Speech/Text Data on AI☆19Sep 13, 2022Updated 3 years ago
- Implementation of DiffWave and SaShiMi audio generation models☆128Apr 4, 2023Updated 2 years ago
- CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding☆22Dec 17, 2025Updated last month
- [ICASSP 2026] Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis☆36Dec 24, 2025Updated last month
- Acoustic echo cancelation(AEC) is a main algorithm in the pipe line of acoustic devices with KWS or ASR. FNLMS is used.☆19Apr 22, 2019Updated 6 years ago
- singing voice conversion without f0☆23May 10, 2023Updated 2 years ago
- A toolkit for any-to-any encoder-decoder voice conversion systems☆84Aug 10, 2023Updated 2 years ago
- Official code of SenSE.☆72Oct 30, 2025Updated 3 months ago
- An End-to-End Pipeline for Enhanced French Text-to-Speech with SSML Prosody Control☆30Jan 13, 2026Updated last month