allenai / OLMoASRLinks
An open-source implementation of Whisper
☆459Updated last month
Alternatives and similar repositories for OLMoASR
Users that are interested in OLMoASR are comparing it to the libraries listed below
Sorting:
- Liquid Audio - Speech-to-Speech audio models by Liquid AI☆276Updated last month
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆295Updated 5 months ago
- OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.☆580Updated last month
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆290Updated 6 months ago
- ☆203Updated last month
- Kyutai with an "eye"☆224Updated 8 months ago
- TTS model capable of streaming conversational audio in realtime.☆128Updated last week
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B☆506Updated last week
- Make text LLMs listen and speak☆994Updated last week
- ☆249Updated 6 months ago
- VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency☆171Updated last month
- ☆330Updated last month
- This is the official repo for the paper "LongCat-Flash-Omni Technical Report"☆413Updated last week
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆859Updated 2 months ago
- Self-host the ultra-lightweight Kitten TTS model with this enhanced API server with an intuitive Web UI, large text processing for audiob…☆213Updated 3 months ago
- VLLM Port of the Chatterbox TTS model☆337Updated last month
- Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…☆692Updated last month
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on On…☆221Updated 6 months ago
- Fast Streaming TTS with Orpheus + WebRTC (with FastRTC)☆344Updated 7 months ago
- Inference, Fine Tuning and many more recipes with Gemma family of models☆274Updated 4 months ago
- ☆399Updated 2 weeks ago
- AudioStory: Generating Long-Form Narrative Audio with Large Language Models☆288Updated 2 months ago
- ☆635Updated 2 weeks ago
- ☆313Updated 3 months ago
- LongCat Audio Tokenizer and Detokenizer☆252Updated last week
- A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…☆706Updated this week
- Official implementation of "Continuous Autoregressive Language Models"☆615Updated 2 weeks ago
- Open Audio Watermarking Tool☆378Updated 5 months ago
- Lightning-fast, on-device TTS — running natively via ONNX.☆1,188Updated this week
- A multi-agent LLM system for detecting and resolving cognitive dissonance.☆269Updated last month