LC1332 / Speaker-GroupingLinks
Grouping and Recognize speaker from an animation video. 从动漫中提取每一个说话人。
☆13Updated last year
Alternatives and similar repositories for Speaker-Grouping
Users that are interested in Speaker-Grouping are comparing it to the libraries listed below
Sorting:
- Follow the rapid development of AIGC models and applications. | 跟上AIGC模型和应用快速发展的步伐 🚀☆81Updated 2 years ago
- Just for debug☆56Updated last year
- A tool for converting MikuMikuDance models (.pmx) with motions(.vmd) to UltraDensePose sequences.☆52Updated 3 years ago
- An initiative to replicate Sora☆104Updated last year
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆40Updated last year
- 二次元角色中文语料库☆49Updated 2 years ago
- 实现基于4k视频的高分辨率人物换衣、虚拟试穿、物品替换☆56Updated 3 years ago
- ☆18Updated 6 months ago
- ☆81Updated 10 months ago
- Implementation for the paper "Can Language Models Learn to Listen?"☆68Updated 2 years ago
- [AAAI 2025] The official repository of UniMuMo☆126Updated 3 months ago
- ☆130Updated 5 months ago
- ☆155Updated 4 months ago
- The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"☆38Updated 7 months ago
- Official repo for Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions☆159Updated 5 months ago
- rwkv finetuning☆37Updated last year
- ☆143Updated 4 months ago
- VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.☆237Updated 6 months ago
- Music large model based on InternLM2-chat.☆22Updated last year
- RWKV-RAG个人版☆25Updated 4 months ago
- MLLM @ Game☆15Updated 7 months ago
- 😜 表情包视觉数据集,使用glm-4v、step-1v的图像解析能力标注。☆143Updated last year
- OpenVideo specializes in the domain of text-to-video generation, with the goal of providing high-quality and diverse video datasets to AI…☆112Updated 7 months ago
- MikuDance: Animating Character Art with Mixed Motion Dynamics☆179Updated 9 months ago
- The plan which extend ChatHaruhi into Zero-shot Roleplaying model☆115Updated last year
- Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Rou…☆29Updated 2 months ago
- ☆241Updated 10 months ago
- Official repo for Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions☆14Updated 11 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆39Updated last year
- 🌻 VITS ONNX TTS server designed for fast inference 🔥☆129Updated 10 months ago