allenai / OLMoASRLinks
An open-source implementation of Whisper
☆454Updated last week
Alternatives and similar repositories for OLMoASR
Users that are interested in OLMoASR are comparing it to the libraries listed below
Sorting:
- Liquid Audio - Speech-to-Speech audio models by Liquid AI☆234Updated last month
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆294Updated 5 months ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆287Updated 5 months ago
- OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.☆400Updated last week
- Kyutai with an "eye"☆223Updated 7 months ago
- Fast Streaming TTS with Orpheus + WebRTC (with FastRTC)☆340Updated 6 months ago
- VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency☆161Updated 2 weeks ago
- ☆322Updated last month
- ☆282Updated 2 months ago
- ☆190Updated 3 weeks ago
- Self-host the ultra-lightweight Kitten TTS model with this enhanced API server with an intuitive Web UI, large text processing for audiob…☆208Updated 3 months ago
- ☆242Updated 5 months ago
- Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…☆673Updated 2 weeks ago
- VLLM Port of the Chatterbox TTS model☆325Updated 3 weeks ago
- Make text LLMs listen and speak☆949Updated this week
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on On…☆220Updated 5 months ago
- ☆634Updated 3 months ago
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆836Updated last month
- Optimized Whisper models for streaming and on-device use☆482Updated last week
- Inference, Fine Tuning and many more recipes with Gemma family of models☆274Updated 3 months ago
- ☆526Updated last month
- ☆832Updated last month
- A multi-agent LLM system for detecting and resolving cognitive dissonance.☆268Updated 3 weeks ago
- This is the official repo for the paper "LongCat-Flash-Omni Technical Report"☆156Updated last week
- AudioStory: Generating Long-Form Narrative Audio with Large Language Models☆283Updated last month
- ☆300Updated 3 months ago
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆128Updated 2 months ago
- Anemoi: A Semi-Centralized Multi-agent Systems Based on Agent-to-Agent Communication MCP server from Coral Protocol☆367Updated 2 months ago
- Tencent Hunyuan A13B (short as Hunyuan-A13B), an innovative and open-source LLM built on a fine-grained MoE architecture.☆802Updated 4 months ago
- LongCat Audio Tokenizer and Detokenizer☆196Updated last week