facebookresearch / sam-audioLinks
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
☆2,841Updated this week
Alternatives and similar repositories for sam-audio
Users that are interested in sam-audio are comparing it to the libraries listed below
Sorting:
- ☆952Updated 3 weeks ago
- Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages☆2,556Updated last week
- Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.☆2,703Updated last month
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆3,206Updated 3 months ago
- PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models☆950Updated 3 weeks ago
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆1,351Updated 8 months ago
- Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.☆2,015Updated this week
- The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment☆1,257Updated 3 weeks ago
- A fundamental toolkit designed for music, song, and audio generation☆1,283Updated 7 months ago
- A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…☆815Updated 2 weeks ago
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆944Updated 3 months ago
- VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)☆909Updated 2 weeks ago
- TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆811Updated 5 months ago
- ☆533Updated 3 months ago
- NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms☆1,145Updated 8 months ago
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆643Updated 9 months ago
- Repository of AudioX☆1,121Updated 8 months ago
- Interface for OuteTTS models.☆1,419Updated 6 months ago
- Soprano: Instant, Ultra-Realistic Text-to-Speech☆746Updated this week
- Make text LLMs listen and speak☆1,068Updated 2 weeks ago
- On-device TTS model by Neuphonic☆4,328Updated 2 weeks ago
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,290Updated 3 months ago
- ACE-Step: A Step Towards Music Generation Foundation Model☆3,593Updated 6 months ago
- Unified automatic quality assessment for speech, music, and sound.☆653Updated 7 months ago
- Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.☆937Updated this week
- The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement☆718Updated last month
- TTS model capable of streaming conversational audio in realtime.☆1,007Updated last month
- Open Audio Watermarking Tool☆447Updated 2 weeks ago
- G2P☆383Updated 5 months ago
- ☆635Updated 2 months ago