QwenLM / Qwen3-ASR-ToolkitLinks
Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support.
☆665Updated last week
Alternatives and similar repositories for Qwen3-ASR-Toolkit
Users that are interested in Qwen3-ASR-Toolkit are comparing it to the libraries listed below
Sorting:
- An open-source implementation of Whisper☆451Updated 3 weeks ago
- ☆524Updated last month
- ☆634Updated 3 months ago
- ☆606Updated last week
- Googles NotebookLM but local☆574Updated last month
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆808Updated last month
- ☆466Updated 5 months ago
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆292Updated 5 months ago
- ☆461Updated 5 months ago
- Build, enrich, and transform datasets using AI models with no code☆1,546Updated last week
- Self-host the ultra-lightweight Kitten TTS model with this enhanced API server with an intuitive Web UI, large text processing for audiob…☆207Updated 2 months ago
- ☆280Updated 2 months ago
- Open-source framework for developing real-time multimodal conversational AI agents.☆487Updated last week
- Learn to build and deploy local Visual Language Models for Edge AI☆288Updated last week
- Kyutai with an "eye"☆222Updated 7 months ago
- Liquid Audio - Speech-to-Speech audio models by Liquid AI☆211Updated last month
- Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video ge…☆977Updated 3 months ago
- VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning☆1,948Updated 3 weeks ago
- ☆2,041Updated this week
- Build AI applications that can see, hear, and speak using your screens, microphones, and cameras as inputs.☆982Updated this week
- Model-agnostic plug-n-play LangChain/LangGraph agents powered entirely by MCP tools over HTTP/SSE.☆635Updated 2 weeks ago
- A quick vibe coded app for deepseek OCR☆1,242Updated last week
- Make text LLMs listen and speak☆940Updated last week
- Anemoi: A Semi-Centralized Multi-agent Systems Based on Agent-to-Agent Communication MCP server from Coral Protocol☆366Updated 2 months ago
- ☆945Updated last week
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,184Updated last month
- VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)☆665Updated this week
- ☆300Updated 2 months ago
- Generate Web Pages and Components with text prompts, with Local Models. (or Cloud Models, if you want) - now supports Thinking Models!☆396Updated 3 months ago
- Unlimited text-to-speech in the Browser using Kokoro-JS, 100% local, 100% open source☆306Updated 4 months ago