facebookresearch / omnilingual-asrLinks
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
☆2,358Updated 2 weeks ago
Alternatives and similar repositories for omnilingual-asr
Users that are interested in omnilingual-asr are comparing it to the libraries listed below
Sorting:
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆866Updated 2 months ago
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆632Updated 7 months ago
- Interface for OuteTTS models.☆1,414Updated 5 months ago
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆1,332Updated 7 months ago
- G2P☆365Updated 3 months ago
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆925Updated last year
- ☆530Updated 2 months ago
- Make text LLMs listen and speak☆1,008Updated 2 weeks ago
- Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…☆698Updated last month
- ☆470Updated 6 months ago
- ☆1,061Updated 2 months ago
- Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching☆724Updated this week
- Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection☆879Updated 6 months ago
- VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)☆778Updated this week
- Real Time Speech Transcription with FastRTC ⚡️and Local Whisper 🤗☆692Updated 4 months ago
- Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.☆2,616Updated last week
- Lightning-fast, on-device TTS — running natively via ONNX.☆1,579Updated last week
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆291Updated 6 months ago
- Voice Activity Detector (VAD) : low-latency, high-performance and lightweight☆1,648Updated this week
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆3,018Updated last month
- Whisper with Medusa heads☆865Updated 4 months ago
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,245Updated 2 months ago
- ☆474Updated 7 months ago
- Liquid Audio - Speech-to-Speech audio models by Liquid AI☆285Updated 2 months ago
- ☆635Updated 3 weeks ago
- PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models☆881Updated 2 weeks ago
- ☆332Updated 2 months ago
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆663Updated last week
- A CLI text-to-speech tool using the Kokoro model, supporting multiple languages, voices (with blending), and various input formats includ…☆934Updated 2 weeks ago
- An open-source implementation of Whisper☆466Updated last month