jamesparsloe / llm.speech
Trying to build an all in one speech-text language model - a bit like GPT-4o
☆22Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for llm.speech
- VALL-E 2 reproduction☆83Updated 3 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆62Updated last week
- ☆59Updated last year
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆36Updated last month
- The TTSDS benchmark evaluates synthetic speech quality by considering prosody, speaker identity, and intelligibility, comparing these fac…☆18Updated this week
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆83Updated last month
- ☆61Updated 3 months ago
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.☆42Updated last month
- ☆26Updated 8 months ago
- Codebase and project page for EDMSound☆29Updated 11 months ago
- Unofficial implementation of wavenext vocoder☆31Updated 2 months ago
- ☆76Updated 2 months ago
- ☆40Updated 4 months ago
- ☆32Updated last month
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆28Updated 3 weeks ago
- VoiceBox neural network implementation☆96Updated 3 months ago
- JAX Implementations of Descript Audio Codec and EnCodec☆20Updated last month
- Audiogen Codec☆126Updated 4 months ago
- Supervoice diffusion enhance☆25Updated 3 months ago
- Pytorch implementation of SoundCTM☆70Updated last month
- AudioSR-Upsampling (any -> 48kHz)☆38Updated 8 months ago
- Audio tokenization, in the fastest way possible!☆45Updated 2 months ago
- GPT-style network for phonemization with durations of text☆62Updated 7 months ago
- ☆41Updated 3 weeks ago
- [ACMMM'2024] Generative Expressive Conversational Speech Synthesis☆25Updated last week
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆48Updated 2 weeks ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆51Updated last year
- SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis☆90Updated last week
- Collection of scripts from mHuBERT-147.☆22Updated 4 months ago
- ☆57Updated 2 months ago