kinghuin / AIGC-progress
Follow the rapid development of AIGC models and applications. | 跟上AIGC模型和应用快速发展的步伐 🚀
☆81Updated last year
Related projects: ⓘ
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆118Updated 3 months ago
- ☆18Updated 3 months ago
- Anim-400K: A dataset designed from the ground up for automated dubbing of video☆97Updated 3 months ago
- ☆40Updated 2 months ago
- LlamaVoice is a llama-based large voice generation model, providing inference and training ability.☆169Updated 3 weeks ago
- ☆34Updated 3 months ago
- 实现基于4k视频的高分辨率人物换衣、虚拟试穿、物品替换☆48Updated 2 years ago
- Voice Conversion Experiments for THUHCSI Course : <Digital Processing of Speech Signals>☆8Updated last year
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆68Updated 2 months ago
- Official codebase for "Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis" (https://arxiv.org/abs/2312.03491).☆119Updated 2 months ago
- ☆23Updated last month
- SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime☆55Updated 3 weeks ago
- Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".☆55Updated last month
- ☆130Updated last month
- flow mirror models from JZX AI Labs☆33Updated last week
- Project page for "Improving Few-shot Learning for Talking Face System with TTS Data Augmentation" for ICASSP2023☆82Updated 11 months ago
- official code for CVPR'24 paper Diff-BGM☆38Updated 5 months ago
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆74Updated 8 months ago
- Official implementation of Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models☆18Updated last week
- MooER: Open-sourced LLM for audio understanding trained on 80,000 hours of data☆112Updated 2 weeks ago
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated last month
- The project page repo for Neural Dubber.☆27Updated last year
- [CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners☆113Updated 2 months ago
- The deme page of InstructTTS☆155Updated 7 months ago
- ☆13Updated 6 months ago
- Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))☆25Updated last month
- The official GitHub page for the survey paper "Foundation Models for Music: A Survey".☆79Updated 2 weeks ago
- Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).☆61Updated 2 months ago
- 单独维护的中文TTS☆35Updated last year
- VoiceBank-2023 is the speech corpus specially designed for constructing personalized Mandarin text-to-speech (TTS) systems.☆36Updated last year