MoonshotAI / Kimi-Audio
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
☆3,500Updated last week
Alternatives and similar repositories for Kimi-Audio
Users that are interested in Kimi-Audio are comparing it to the libraries listed below
Sorting:
- ☆4,280Updated 2 months ago
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆2,913Updated this week
- GLM-4-Voice | 端到端中英语音对话模型☆2,910Updated 5 months ago
- ☆2,928Updated 2 months ago
- Towards Human-Sounding Speech☆4,750Updated last week
- SkyReels-V2: Infinite-length Film Generative model☆2,183Updated last week
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,756Updated 2 weeks ago
- ☆5,266Updated last week
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆1,603Updated last week
- The python library for real-time communication☆3,891Updated this week
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,721Updated 3 weeks ago
- MAGI-1: Autoregressive Video Generation at Scale☆3,001Updated this week
- ☆5,944Updated this week
- ACE-Step: A Step Towards Music Generation Foundation Model☆1,766Updated this week
- Multilingual Voice Understanding Model☆5,593Updated last month
- Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…☆969Updated last month
- InspireMusic: A Unified Framework for Music, Song, Audio Generation.☆1,086Updated last week
- An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System☆1,716Updated this week
- Spark-TTS Inference Code☆9,357Updated last month
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,313Updated 6 months ago
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆2,599Updated this week
- Interface for OuteTTS models.☆1,227Updated 2 weeks ago
- ☆875Updated last month
- https://hf.co/hexgrad/Kokoro-82M☆2,777Updated 2 weeks ago
- Suna - Open Source Generalist AI Agent☆10,978Updated this week
- YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open☆4,966Updated this week
- SkyReels V1: The first and most advanced open-source human-centric video foundation model☆2,148Updated 2 months ago
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆8,648Updated this week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,918Updated 3 weeks ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆4,019Updated last month