Fr0zenCrane / Cockatiel
The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"
☆15Updated last week
Alternatives and similar repositories for Cockatiel:
Users that are interested in Cockatiel are comparing it to the libraries listed below
- ☆26Updated this week
- ☆48Updated 4 months ago
- An official pytorch implementation of "MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts"☆31Updated 5 months ago
- Implementation code of the paper MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing☆57Updated 2 months ago
- Blending Custom Photos with Video Diffusion Transformers☆46Updated 3 months ago
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆47Updated 7 months ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆72Updated this week
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆83Updated 9 months ago
- EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM☆55Updated last month
- FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation☆56Updated 2 weeks ago
- [WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"☆32Updated last month
- Finetuning and inference tools for the CogView4 and CogVideoX model series.☆53Updated last week
- TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆30Updated 5 months ago
- ☆38Updated 3 weeks ago
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆80Updated last year
- [CVPR 2025] InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption 🔍☆40Updated last month
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆37Updated this week
- DiT for VAE (and Video Generation)☆32Updated 8 months ago
- An Efficient Text-to-Image Generation Pretrain Pipeline☆103Updated 3 weeks ago
- Concat-ID: Towards Universal Identity-Preserving Video Synthesis☆36Updated last week
- Chinese-native image generation while compatible with SD eco-system, 1st-gen, AAAI2025☆12Updated 10 months ago
- ☆63Updated 8 months ago
- ☆91Updated 2 weeks ago
- [AAAI 2025] LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation☆42Updated 4 months ago
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆70Updated 9 months ago
- [CVPR2025] Official implementation of High Fidelity Scene Text Synthesis.☆61Updated last month
- Official repo: SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing☆52Updated last year
- [ NeurIPS 2024 D&B Track ] Implementation for "FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models"☆68Updated 4 months ago
- ☆19Updated 2 weeks ago
- [ICLR2025] ClassDiffusion: Official impl. of Paper "ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance"☆41Updated last month