QwenLM / Qwen3-ASR-ToolkitLinks
Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support.
☆735Updated 2 months ago
Alternatives and similar repositories for Qwen3-ASR-Toolkit
Users that are interested in Qwen3-ASR-Toolkit are comparing it to the libraries listed below
Sorting:
- GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters☆639Updated this week
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆912Updated 3 months ago
- ☆532Updated 3 months ago
- A real-time Electron-based desktop GUI for DeepSeek-OCR☆718Updated last week
- ☆659Updated 2 months ago
- ☆472Updated 7 months ago
- Googles NotebookLM but local☆687Updated 3 weeks ago
- A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…☆796Updated last week
- ☆635Updated last month
- GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning☆821Updated 2 weeks ago
- Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages☆2,537Updated 2 weeks ago
- ☆1,072Updated 2 months ago
- Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.☆495Updated last week
- ☆482Updated 7 months ago
- Open-source framework for developing real-time multimodal conversational AI agents.☆549Updated this week
- "OpenPhone: Mobile Agentic Foundation Models for AI Phone"☆426Updated 2 weeks ago
- An open-source implementation of Whisper☆469Updated 2 months ago
- TTS model capable of streaming conversational audio in realtime.☆936Updated last month
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆306Updated 7 months ago
- VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)☆889Updated last week
- MAI-UI: Real-World Centric Foundation GUI Agents.☆730Updated this week
- Learn to build and deploy local Visual Language Models for Edge AI☆333Updated 2 months ago
- A quick vibe coded app for deepseek OCR☆1,536Updated last month
- ☆810Updated 2 months ago
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,284Updated 3 months ago
- Model-agnostic plug-n-play LangChain/LangGraph agents powered entirely by MCP tools over HTTP/SSE.☆785Updated 2 months ago
- Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video ge…☆1,132Updated last month
- Generate Web Pages and Components with text prompts, with Local Models. (or Cloud Models, if you want) - 0.5.0 Update!☆400Updated last week
- Unlimited text-to-speech in the Browser using Kokoro-JS, 100% local, 100% open source☆317Updated 6 months ago
- Tencent Hunyuan A13B (short as Hunyuan-A13B), an innovative and open-source LLM built on a fine-grained MoE architecture.☆809Updated 5 months ago