facebookresearch / sam-audioLinks
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
☆799Updated this week
Alternatives and similar repositories for sam-audio
Users that are interested in sam-audio are comparing it to the libraries listed below
Sorting:
- ☆943Updated this week
- PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models☆919Updated this week
- Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages☆2,467Updated this week
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆1,343Updated 8 months ago
- TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆807Updated 4 months ago
- A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…☆786Updated last week
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆903Updated 3 months ago
- ☆634Updated last month
- Unified automatic quality assessment for speech, music, and sound.☆648Updated 6 months ago
- NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms☆1,137Updated 8 months ago
- Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.☆2,663Updated 3 weeks ago
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆639Updated 8 months ago
- ☆532Updated 2 months ago
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆3,115Updated 2 months ago
- Make text LLMs listen and speak☆1,028Updated last week
- VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)☆870Updated last week
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆926Updated last year
- Lightning-Fast, On-Device TTS — running natively via ONNX.☆1,844Updated this week
- The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement☆690Updated 2 weeks ago
- Interface for OuteTTS models.☆1,415Updated 6 months ago
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆289Updated last month
- Open Audio Watermarking Tool☆416Updated 2 weeks ago
- Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate☆730Updated last year
- ☆472Updated 7 months ago
- Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching☆738Updated 2 weeks ago
- ☆317Updated 3 months ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆291Updated 7 months ago
- A fundamental toolkit designed for music, song, and audio generation☆1,262Updated 7 months ago
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆282Updated 4 months ago
- A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.☆412Updated 3 months ago