facebookresearch / sam-audioLinks
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
☆3,183Updated 3 weeks ago
Alternatives and similar repositories for sam-audio
Users that are interested in sam-audio are comparing it to the libraries listed below
Sorting:
- Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages☆2,611Updated last month
- ☆975Updated last month
- Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.☆2,822Updated this week
- The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment☆1,324Updated last month
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆963Updated 4 months ago
- Soprano: Instant, Ultra-Realistic Text-to-Speech☆1,137Updated 2 weeks ago
- Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.☆2,521Updated last week
- A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…☆836Updated this week
- PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models☆981Updated last month
- On-device TTS model by Neuphonic☆4,718Updated 2 weeks ago
- Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music…☆756Updated this week
- A TTS that fits in your CPU (and pocket)☆2,683Updated this week
- A lightning fast audio upsampler.☆664Updated last week
- Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…☆747Updated 3 months ago
- Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"☆1,524Updated this week
- Qwen-Image-Layered: Layered Decomposition for Inherent Editablity☆1,508Updated last month
- ☆536Updated 4 months ago
- VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning☆5,584Updated last week
- VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)☆951Updated last week
- Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streamin…☆6,204Updated last week
- A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.☆336Updated last week
- Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.☆711Updated last week
- Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.☆3,189Updated 2 weeks ago
- [ICLR 2026] TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆827Updated this week
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆1,368Updated 9 months ago
- PersonaPlex code.☆3,110Updated last week
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,326Updated 4 months ago
- GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning☆903Updated last month
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆651Updated last week
- ☆2,011Updated last month