facebookresearch / sam-audioLinks
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
☆3,286Updated last month
Alternatives and similar repositories for sam-audio
Users that are interested in sam-audio are comparing it to the libraries listed below
Sorting:
- Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages☆2,632Updated last month
- ☆979Updated last month
- On-device TTS model by Neuphonic☆4,768Updated last week
- Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music…☆1,317Updated last week
- PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models☆986Updated last month
- A TTS that fits in your CPU (and pocket)☆3,134Updated this week
- A lightning fast audio upsampler.☆710Updated last week
- Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.☆2,832Updated 2 weeks ago
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆3,399Updated last month
- Soprano: Instant, Ultra-Realistic Text-to-Speech☆1,164Updated 3 weeks ago
- The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment☆1,337Updated last month
- A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…☆870Updated this week
- VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning☆5,851Updated 2 weeks ago
- Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.☆835Updated 2 weeks ago
- TTS model capable of streaming conversational audio in realtime.☆1,051Updated 2 months ago
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆965Updated 4 months ago
- VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)☆965Updated 2 weeks ago
- Qwen-Image-Layered: Layered Decomposition for Inherent Editablity☆1,540Updated last month
- A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.☆694Updated 2 weeks ago
- Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.☆3,786Updated this week
- [ICLR 2026] TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆831Updated 2 weeks ago
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆1,378Updated 9 months ago
- Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.☆2,588Updated 3 weeks ago
- ☆537Updated 4 months ago
- GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning☆923Updated last month
- A fundamental toolkit designed for music, song, and audio generation☆1,305Updated 8 months ago
- The most powerful local music generation model that outperforms most commercial alternatives☆5,266Updated this week
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,336Updated 4 months ago
- NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms☆1,155Updated 9 months ago
- The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement☆746Updated 2 months ago