tincans-ai / gazelleView external linksLinks
Joint speech-language model - respond directly to audio!
☆372Jul 1, 2024Updated last year
Alternatives and similar repositories for gazelle
Users that are interested in gazelle are comparing it to the libraries listed below
Sorting:
- Joint speech-language model - respond directly to audio!☆30May 13, 2024Updated last year
- proof of concept conversation orchestrator with a speech-language model☆20Oct 19, 2024Updated last year
- ☆16Oct 6, 2024Updated last year
- ☆19Mar 22, 2024Updated last year
- ☆38Apr 15, 2024Updated last year
- ☆54Jul 16, 2025Updated 7 months ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆59Oct 23, 2024Updated last year
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆26Dec 12, 2024Updated last year
- Project of Singing Voice Conversion.☆16Oct 27, 2023Updated 2 years ago
- A fast multimodal LLM for real-time voice☆4,350Dec 12, 2025Updated 2 months ago
- This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.☆32Jan 26, 2024Updated 2 years ago
- ☆29Feb 4, 2025Updated last year
- Real-time Speech-Text Foundation Model Toolkit (wip)☆252Mar 26, 2025Updated 10 months ago
- ☆258Mar 15, 2024Updated last year
- Text-To-Speech for NotebookLM☆37Jul 20, 2025Updated 6 months ago
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.☆15May 16, 2025Updated 9 months ago
- A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.☆76Oct 22, 2024Updated last year
- ☆61Nov 4, 2023Updated 2 years ago
- Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate☆746Nov 19, 2024Updated last year
- GPT-style network for phonemization with durations of text☆68Mar 21, 2024Updated last year
- VoiceLDM: Text-to-Speech with Environmental Context☆191Aug 9, 2024Updated last year
- Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models☆59Jul 1, 2025Updated 7 months ago
- My vocoder experiments☆31Jul 26, 2025Updated 6 months ago
- first base model for full-duplex conversational audio☆1,773Jan 5, 2025Updated last year
- VoiceBox neural network implementation☆110Aug 2, 2024Updated last year
- Local realtime voice AI☆2,429Nov 26, 2025Updated 2 months ago
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- A pitch detection model trained to be robust against noise and reverberation environments.☆27Jan 21, 2025Updated last year
- Official Implementation of EnCLAP (ICASSP 2024)☆94Jun 2, 2024Updated last year
- Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale☆28Aug 4, 2023Updated 2 years ago
- A sequence-to-sequence voice conversion toolkit.☆108Jul 5, 2024Updated last year
- A ggml (C++) re-implementation of tortoise-tts☆193Aug 20, 2024Updated last year
- High fidelity, lightweight, end-to-end, streaming, convolution-based neural audio codec☆115Jun 23, 2025Updated 7 months ago
- X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion☆111Apr 1, 2024Updated last year
- ☆25Mar 6, 2024Updated last year
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆57Oct 31, 2023Updated 2 years ago
- Official repository for "Speaking Style Conversion With Discrete Self-Supervised Units" (EMNLP 2023). https://arxiv.org/abs/2212.09730☆131Dec 8, 2023Updated 2 years ago
- ☆15Nov 11, 2024Updated last year
- A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts☆16Dec 3, 2024Updated last year