LC1332 / Speaker-GroupingLinks
Grouping and Recognize speaker from an animation video. 从动漫中提取每一个说话人。
☆13Updated last year
Alternatives and similar repositories for Speaker-Grouping
Users that are interested in Speaker-Grouping are comparing it to the libraries listed below
Sorting:
- Just for debug☆56Updated last year
- Follow the rapid development of AIGC models and applications. | 跟上AIGC模型和应用快速发展的步伐 🚀☆81Updated last year
- The WorldRWKV project aims to implement training and inference across various modalities using the RWKV7 architecture. By leveraging diff…☆54Updated 3 weeks ago
- ☆18Updated 2 months ago
- 😜 表情包视觉数据集,使用glm-4v、step-1v的图像解析能力标注。☆134Updated last year
- 🔥Your Daily Dose of AI Research from Hugging Face 🔥 Stay updated with the latest AI breakthroughs! This bot automatically collects and…☆54Updated this week
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆40Updated last year
- The official implement of VITA, VITA15, LongVITA, and VITA-Audio.☆34Updated last month
- GLM Series Edge Models☆148Updated 2 months ago
- Official repo for Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions☆121Updated 2 months ago
- An initiative to replicate Sora☆103Updated last year
- 模拟《女神异闻录3》中角色埃癸斯(アイギス,Aegis)的语音助手 Demo☆14Updated 7 months ago
- ☆238Updated 6 months ago
- ☆14Updated last year
- rwkv finetuning☆37Updated last year
- Official implementation of CharacterShot: Controllable and Consistent 4D Character Animation☆43Updated 3 weeks ago
- PresentAgent: Multimodal Agent for Presentation Video Generation☆98Updated last month
- ☆129Updated 2 weeks ago
- 实现基于4k视频的高分辨率人物换衣、虚拟试穿、物品替换☆54Updated 2 years ago
- Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Rou…☆24Updated last month
- VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.☆233Updated 3 months ago
- ☆81Updated 7 months ago
- The plan which extend ChatHaruhi into Zero-shot Roleplaying model☆108Updated last year
- The official implement of Freeze-Omni.☆13Updated last month
- MLLM @ Game☆14Updated 3 months ago
- Implementation for the paper "Can Language Models Learn to Listen?"☆66Updated 2 years ago
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆38Updated 11 months ago
- The official repository of UniMuMo☆116Updated 3 months ago
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆124Updated 9 months ago
- 🌻 VITS ONNX TTS server designed for fast inference 🔥☆128Updated 7 months ago