NVIDIA / audio-flamingo
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
☆189Updated last month
Related projects ⓘ
Alternatives and complementary repositories for audio-flamingo
- VoiceLDM: Text-to-Speech with Environmental Context☆163Updated 3 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆89Updated 5 months ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆136Updated last year
- Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generati…☆154Updated 7 months ago
- Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.☆146Updated 2 months ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆77Updated 3 months ago
- Audiogen Codec☆126Updated 4 months ago
- The official Implementation of PeriodWave and PeriodWave-Turbo☆128Updated 2 months ago
- Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆102Updated last month
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization☆157Updated 3 months ago
- CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model☆183Updated 6 months ago
- Unofficial implementation of NVIDIA P-Flow TTS paper☆217Updated 4 months ago
- ☆76Updated 2 months ago
- FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3☆166Updated 6 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆87Updated 3 months ago
- Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.☆93Updated last year
- AudioBench: A Universal Benchmark for Audio Large Language Models☆89Updated last month
- Implementation of SoundStorm built upon SpeechTokenizer.☆103Updated last year
- Audio Captioning datasets for PyTorch.☆105Updated this week
- Real-time Speech-Text Foundation Model Toolkit (wip)☆119Updated 3 weeks ago
- This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.☆204Updated 3 months ago
- MU-LLaMA: Music Understanding Large Language Model☆235Updated 7 months ago
- ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations☆121Updated 8 months ago
- SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis☆90Updated last week
- GOMIN; Gaudio Open Mel-spectrogram Inversion Network☆109Updated 9 months ago
- Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E☆135Updated 2 weeks ago
- Official implementation of Vec-Tok Speech☆93Updated last year
- Pytorch implementation of BigVSAN☆198Updated 7 months ago
- The official GitHub page for the survey paper "Foundation Models for Music: A Survey".☆91Updated 2 months ago
- The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tu…☆71Updated 2 months ago