allenai/OLMoASR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/allenai/OLMoASR)

allenai / OLMoASR

An open-source implementation of Whisper

☆491

Alternatives and similar repositories for OLMoASR

Users that are interested in OLMoASR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

line / WaveTrainerFit
View on GitHub
Official implementation of "Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech G…
☆16Feb 6, 2026Updated 5 months ago
bytedance / USO
View on GitHub
[CVPR 2026] 🔥🔥 Official Repo of USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning
☆1,226Sep 12, 2025Updated 10 months ago
TencentARC / AudioStory
View on GitHub
AudioStory: Generating Long-Form Narrative Audio with Large Language Models
☆301Sep 21, 2025Updated 10 months ago
leto19 / WhiSQA
View on GitHub
Whisper Speech Quality Assessment (WhiSQA)
☆16Apr 14, 2026Updated 3 months ago
evalops / cognitive-dissonance-dspy
View on GitHub
A multi-agent LLM system for detecting and resolving cognitive dissonance.
☆282Apr 25, 2026Updated 2 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
XiaomiMiMo / MiMo-Audio
View on GitHub
MiMo-Audio: Audio Language Models are Few-Shot Learners
☆1,065Jun 17, 2026Updated last month
ryuclc / CosyVoice2-GRPO
View on GitHub
A simple implementation for improving CosyVoice2 by GRPO method
☆39May 5, 2026Updated 2 months ago
llm-jp / llama-mimi
View on GitHub
Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…
☆31Sep 20, 2025Updated 10 months ago
Marvis-Labs / marvis-tts
View on GitHub
☆365Aug 28, 2025Updated 10 months ago
zhu-han / SpeechLLM
View on GitHub
LLM-based ASR recipe with Zipformer encoder and Qwen LLM
☆34Sep 25, 2025Updated 9 months ago
YangXusheng-yxs / CodecFormer_5Hz
View on GitHub
☆35Oct 23, 2025Updated 8 months ago
TheStageAI / TheWhisper
View on GitHub
Optimized Whisper models for streaming and on-device use
☆892Jun 15, 2026Updated last month
Vyvo-Labs / VyvoTTS
View on GitHub
VyvoTTS: LLM-Based Text-to-Speech Training Framework
☆257Apr 8, 2026Updated 3 months ago
Coral-Protocol / Anemoi
View on GitHub
Anemoi: A Semi-Centralized Multi-agent Systems Based on Agent-to-Agent Communication MCP server from Coral Protocol
☆370Aug 27, 2025Updated 10 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
AmphionTeam / FlexiCodec
View on GitHub
[ICLR2026] FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
☆50Jul 1, 2026Updated 2 weeks ago
kaistmm / VoiceDiT
View on GitHub
[ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis
☆52Apr 9, 2025Updated last year
HeCheng0625 / Diffusion-Speech-Tokenizer
View on GitHub
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…
☆198Jan 25, 2026Updated 5 months ago
facebookresearch / omnilingual-asr
View on GitHub
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
☆2,854Dec 30, 2025Updated 6 months ago
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
QwenLM / Qwen3-ASR-Toolkit
View on GitHub
Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…
☆980Feb 5, 2026Updated 5 months ago
xiquan-li / MeanAudio
View on GitHub
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆142Sep 2, 2025Updated 10 months ago
alibaba / vstyle
View on GitHub
☆34Sep 15, 2025Updated 10 months ago
EIT-NLP / LLaSO
View on GitHub
☆116Oct 21, 2025Updated 9 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
yzGuu830 / efficient-speech-codec
View on GitHub
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
☆126Mar 20, 2025Updated last year
BUTSpeechFIT / SOT-DiCoW
View on GitHub
Multi-talker ASR based on DiCoW with Serialized Output Training
☆20Sep 18, 2025Updated 10 months ago
ozspeech / OZSpeech
View on GitHub
[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
☆45Feb 9, 2025Updated last year
Wataru-Nakata / latentlm-tts
View on GitHub
☆29Jul 3, 2026Updated 2 weeks ago
xingchensong / TouchNet
View on GitHub
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.
☆232Jul 2, 2026Updated 2 weeks ago
adelacvg / detail_tts
View on GitHub
All generative model in one for better TTS model
☆74Sep 8, 2024Updated last year
meituan-longcat / LongCat-Audio-Codec
View on GitHub
LongCat Audio Tokenizer and Detokenizer
☆301May 9, 2026Updated 2 months ago
jgarzik / xbasic64
View on GitHub
BASIC compiler for x86-64
☆52Jan 16, 2026Updated 6 months ago
superradcompany / rxtui
View on GitHub
🌊 reactive terminal interfaces for Rust. because CLIs deserve better.
☆349Dec 30, 2025Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
nonverbalspeech38k / nonverspeech38k
View on GitHub
The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…
☆68Dec 26, 2025Updated 6 months ago
Lab-MSP / NaturalVoices
View on GitHub
☆33Oct 28, 2025Updated 8 months ago
Liquid4All / liquid-audio
View on GitHub
Liquid Audio - Speech-to-Speech audio models by Liquid AI
☆548Jun 5, 2026Updated last month
jasonppy / VoiceStar
View on GitHub
VoiceStar: Robust, Duration-controllable TTS that can Extrapolate
☆314May 31, 2025Updated last year
JHU-LCAP / FlexSED
View on GitHub
open-vocabulary sound event detection
☆53Dec 17, 2025Updated 7 months ago
cartesiancs / vessel
View on GitHub
🦾 Runtime for Physical AI. Connect streams. Detect events. Control your world.
☆327Jul 5, 2026Updated 2 weeks ago
AbrahamSanders / codec-bpe
View on GitHub
Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs
☆76Dec 3, 2025Updated 7 months ago