Stability-AI / stable-codec
A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.
☆330Updated last month
Alternatives and similar repositories for stable-codec:
Users that are interested in stable-codec are comparing it to the libraries listed below
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆174Updated this week
- The official Implementation of PeriodWave and PeriodWave-Turbo☆162Updated last week
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆226Updated 4 months ago
- Object-oriented handling of audio data, with GPU-powered augmentations, and more.☆257Updated last month
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆132Updated last month
- Audiogen Codec☆131Updated 7 months ago
- Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate☆482Updated 3 months ago
- Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.☆188Updated 5 months ago
- ☆346Updated 5 months ago
- AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆154Updated last month
- Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.☆189Updated last week
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆132Updated 5 months ago
- [ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"☆328Updated 5 months ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆109Updated 2 months ago
- FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3☆187Updated 10 months ago
- ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model☆201Updated 9 months ago
- A simple library for Fréchet Audio Distance (FAD) calculation☆179Updated last week
- The Song Describer dataset is an evaluation dataset made of ~1.1k captions for 706 permissively licensed music recordings.☆147Updated last year
- VoiceLDM: Text-to-Speech with Environmental Context☆169Updated 6 months ago
- HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform☆152Updated last month
- Training code for FAcodec presented in NaturalSpeech3☆194Updated 5 months ago
- Unofficial implementation of NVIDIA P-Flow TTS paper☆220Updated last month
- Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)☆368Updated this week
- Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stabili…☆165Updated 6 months ago
- Pitch Estimating Neural Networks (PENN)☆242Updated 6 months ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆129Updated last week
- UTokyo-SaruLab MOS Prediction System☆147Updated 2 months ago
- An Open-source Streaming High-fidelity Neural Audio Codec☆457Updated 3 months ago
- Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generati…☆180Updated 10 months ago