[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.
☆1,163Jan 27, 2026Updated last month
Alternatives and similar repositories for ThinkSound
Users that are interested in ThinkSound are comparing it to the libraries listed below
Sorting:
- A fundamental toolkit designed for music, song, and audio generation☆1,306May 20, 2025Updated 9 months ago
- Align Anything: Training All-modality Model with Feedback☆4,636Nov 27, 2025Updated 3 months ago
- [ICML 2025] PyTorch Implementation of "OmniAudio: Generating Spatial Audio from 360-Degree Video"☆362Jun 27, 2025Updated 8 months ago
- Audio-FLAN☆160Sep 23, 2025Updated 5 months ago
- [ICLR 2025] Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation☆3,679Feb 27, 2025Updated last year
- [CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis☆2,096Feb 23, 2026Updated last week
- PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models☆1,006Dec 15, 2025Updated 2 months ago
- Repository of AudioX☆1,215Feb 14, 2026Updated 2 weeks ago
- Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation☆8,645Sep 14, 2024Updated last year
- 数字底座是一款面向大型政府、企业数字化转型,基于身份认证、组织架构、岗位职务、应用系统、资源角色、数据目录、安全控制等功能构建的统一且安全的管理支撑平台。数字底座基于三员管理模式,具备微服务、多租户、容器化和国产化,支持用户利用代码生成器快速构建自己的业务应用,同时可关联诸…☆2,574Updated this week
- Unified automatic quality assessment for speech, music, and sound.☆681Jun 5, 2025Updated 8 months ago
- PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.☆1,155Jul 1, 2025Updated 8 months ago
- The first Large Audio Language Model that enables native in-depth thinking, which is trained on large-scale audio Chain-of-Thought data.☆284May 15, 2025Updated 9 months ago
- Klavis AI (YC X25): MCP integration platforms that let AI agents use tools reliably at any scale☆5,644Updated this week
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆329Dec 17, 2025Updated 2 months ago
- 💰唯一正版💰 minerproxy minerproxy minerproxy minerproxy minerproxy minerproxy minerproxy minerproxy minerproxy minerproxy 矿池抽水 矿池代理 矿池中转 矿池抽…☆3,882Feb 2, 2026Updated 3 weeks ago
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆969Sep 20, 2025Updated 5 months ago
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆2,251Nov 27, 2025Updated 3 months ago
- The next generation deep reinforcement learning tookit☆3,460Jun 16, 2023Updated 2 years ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆186May 29, 2024Updated last year
- The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement☆751Dec 4, 2025Updated 2 months ago
- TVM Documentation in Chinese Simplified / TVM 中文文档☆3,408Nov 21, 2025Updated 3 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆118May 19, 2025Updated 9 months ago
- The first open autoregressive foundational video AI model.☆2,891Oct 14, 2024Updated last year
- [ICLR 2026] TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆840Jan 28, 2026Updated last month
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆349Jul 21, 2025Updated 7 months ago
- [ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.☆119Oct 17, 2025Updated 4 months ago
- ACE-Step: A Step Towards Music Generation Foundation Model☆4,130Feb 15, 2026Updated 2 weeks ago
- The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment☆1,355Dec 13, 2025Updated 2 months ago
- An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.☆222Updated this week
- Official implementation of the paper "MusicInfuser: Making Video Diffusion Listen and Dance"☆82Apr 10, 2025Updated 10 months ago
- [CVPR 2025] Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer☆1,368Mar 13, 2025Updated 11 months ago
- FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…☆441Jan 25, 2024Updated 2 years ago
- A Doctor for your data☆3,489Jan 14, 2025Updated last year
- PromptEnhancer is a prompt-rewriting tool, refining prompts into clearer, structured versions for better image generation.☆3,546Jan 26, 2026Updated last month
- 🔥minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,矿池抽水,矿池中转,矿场运维专用☆3,246Jan 14, 2026Updated last month
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆482Nov 23, 2025Updated 3 months ago
- [NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation☆2,813Dec 18, 2025Updated 2 months ago
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆434Sep 13, 2024Updated last year