allenai / OLMoASRLinks
An open-source implementation of Whisper
☆469Updated last month
Alternatives and similar repositories for OLMoASR
Users that are interested in OLMoASR are comparing it to the libraries listed below
Sorting:
- Liquid Audio - Speech-to-Speech audio models by Liquid AI☆298Updated 2 months ago
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆298Updated 6 months ago
- OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.☆603Updated last month
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆291Updated 7 months ago
- ☆218Updated 2 months ago
- Kyutai with an "eye"☆230Updated 8 months ago
- VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency☆176Updated last month
- ☆370Updated last month
- TTS model capable of streaming conversational audio in realtime.☆920Updated 3 weeks ago
- ☆425Updated 3 weeks ago
- Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…☆723Updated 2 months ago
- ☆249Updated 7 months ago
- VLLM Port of the Chatterbox TTS model☆351Updated 2 months ago
- ☆424Updated 2 weeks ago
- ☆343Updated 2 months ago
- Self-host the ultra-lightweight Kitten TTS model with this enhanced API server with an intuitive Web UI, large text processing for audiob…☆217Updated 4 months ago
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆903Updated 3 months ago
- Inference, Fine Tuning and many more recipes with Gemma family of models☆276Updated 5 months ago
- Official implementation of "Continuous Autoregressive Language Models"☆673Updated 2 weeks ago
- DACVAE☆124Updated this week
- Optimized Whisper models for streaming and on-device use☆765Updated this week
- Fast Streaming TTS with Orpheus + WebRTC (with FastRTC)☆346Updated 8 months ago
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B☆542Updated last month
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on On…☆224Updated 7 months ago
- This is the official repo for the paper "LongCat-Flash-Omni Technical Report"☆436Updated this week
- Streaming and Fine-tuning for Chatterbox TTS☆237Updated 6 months ago
- Anemoi: A Semi-Centralized Multi-agent Systems Based on Agent-to-Agent Communication MCP server from Coral Protocol☆370Updated 3 months ago
- ☆317Updated 3 months ago
- ☆532Updated 2 months ago
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆130Updated 4 months ago